missingbytes

Saturday, 14 July 2012

Community

I think technology was always about bringing people together. You can look at all the greatest inventions we humans have, things like mobile phones and automobiles, and I think you'll find their most sweeping enduring legacy will be in social terms - how they changed the way we as individuals interact with our friends, peers, families etc.

I think that used to be true of video games too. The (enduring) impact of a video game is cultural. You could frame it as, "How does [the game] let different players interact, come together, be playful, etc. in a social sense."

The Master and the Student

Take the arcade classic, Super Street Fighter II. It has this simple mechanic where 1 credit buys 2 players a best-of-three match.

The master camps out on the machine - he plays for free the whole day. One by one, the students come up to feed the machine a quarter.

Round 1: The master refuses to attack. He ducks, dodges, feints and runs. But without attacking, ultimately the score will be Master: 0, Student: 1.

Round 2: The master attacks, but only using one technique. Punches, or fireballs. Whatever is the student's weakness. Of course, Master: 1, Student: 1.

Round 3: It's on! In the deciding round they have a real fight, with the master either exploiting or avoiding the student's weakness depending on his temperament.

What's fascinating to me about this, is the community of players that it builds up around the game. There's this implicit trust that builds up around the master/student relationship that carries forward when the student becomes skilled enough to provide a challenge to the master.

Players can have a dialog about the game, even when they're not actually playing. If you're on the bus, you can still have a social moment with another SFII player.

The Metrics

These days, we all love our metrics. MAU - Monthly Average Users. How many people (in numbers) are playing your game?

I think sometimes we get so blinded by the ability to measure what our players are doing, we forget to look to the quality of the interactions between our players, and (to some degree) to the qualities of the players themselves.

Take a closer look at those SFII masters. They played that way because that's how they learned to play. It was cultural. But they were also self-selected because they were the players who wanted that kind of experience. They're the kind of players who foster new players and (through their actions) create a strong community.

Your can't write a metric to capture those kinds of qualities. (And even if you could, what would you use it for? It could only tell you about the community you had in the past, not about the one you're going to have in the future.)

A recipe?

Bringing it full circle, I believe that video games are still about bringing people together. The games that do that well are the kinds of games that I want to make, and the kinds of games that I'm trying to make.

Of course I want a large community around each of my games, as measured by MAU.

But before that, and more importantly than that, I want a strong community around my games, as measured by good-vibes and actually talking with real people.

Now it's your turn to help me help you : How do you build the right community around your game?

Let me know in the comments below, or drop me an email and lets talk!

Thursday, 12 July 2012

I Make Video Games

For me, it all started back in 1985 with the Commodore Vic-20. I'd while away the hours typing in games from the magazines and storing them on magnetic tape.

Fast-forward to 1995 and the Amiga. Me and some buddies launched Super Skidmarks to outstanding critical acclaim.

I love the process of making video games. It's a series of puzzles. Solving each puzzle unlocks even more puzzles. As you get deeper and deeper, the puzzles get more and more intricate, and it becomes harder and harder to distinguish the best solution amongst all the correct solutions. Always the fascination remains.

I love making games for gamers. I love passing the gamepad over to a gamer - passing the gamepad over to you - to see how you'll react. There's this one moment that I really love in game development. It's that moment when I try to probe you for feedback on my game, but you're so engrossed in the gameplay, you're physically unable to stop playing long enough to engage in meaningful conversation.

In 2005, I launched Black&White 2 with Lionhead Studios on the PC. The game was a technical masterpiece and wildly ambitious.

Over the last few years, I've worked on many, many, many, many unreleased projects. Those are the projects during which you grow the most.

I've been incredibly fortunate to work with, and learn from, so many amazingly talented people. From programmers and artists, from QA and production. Gifted musicians and mocap performers. Everyone. Thank you so much! It's from you I learned everything.

Most recently I've been fortunate enough to work on the Mass Effect franchise with BioWare and on the Rainbow 6 franchise with Ubisoft.

Also the surprise hit at this year's E3, Watch Dogs.

But when I sit back and reflect, it feels like I've been working on increasingly smaller and smaller pieces (with ever increasing detail) of increasingly larger and larger games. I'm always truly excited to be a part of a AAA blockbuster... but I miss that visceral connection with the gamer that comes with smaller teams and shorter development cycles.

It's taken me a while to realize, but the thing I love the most about video games, the reason I got into all of this in the first place, is when your delicate, fragile little game, (or big game!) that you've put so much effort into, finally makes it out to the gamers - to you. Well... that's why I Make Video Games.

And while I never stopped making mini-games (and playful spaces) along the way, almost all of them I've been prevented from finishing because of contractual obligations.

That's why, as of today, I've returned to Indie Game Development. To make video games in their entirety. To make every little piece, from top to bottom, everything custom crafted with gamers in mind. To make the best games for gamers.

To make video games, for you.

Saturday, 7 July 2012

The Compressed Reconstituted Potato Container Dilemma

There's a particular brand of compressed reconstituted salty potato snack that comes packaged in a distinctive cylindrical container. It appears to enjoy a certain popularity amongst the kids these days.

I mention it, because it neatly divides the human population into two mutually exclusive groups.

Those whose hands are small enough to reach inside the container to retrieve a salty snack.

Those who are unable to reach inside the container to retrieve a salty snack.

Small hands, Group A.

When there are only a few salty snacks present in the bottom of the container, those with small hands can reach deep down inside to obtain one more.

Delicious!

Large hands, Group B.

By contrast, when there are only a few salty snacks remaining, members of Group B must steady the container with one hand whilst simultaneously securing the container's mouth with the other. They then gently tip the container so as not to risk undue spillage.

The Paradox

I can't reach!

Consider what happens when two friends of the same group attempt to share when there is only a few salty snacks remaining in the bottom of the container. Friends belonging to Group A will proffer the container at a slight angle, allowing their compatriot the opportunity to reach inside and obtain a satisfyingly tasty morsel.

Likewise, friends belonging to Group B will pass over the entire container, to allow their compatriots the use of both hands to regulate flow control, minimizing spillage.

For sharing involving purely Group-A, or purely Group-B, no conflict will arise, so let us not consider such cases any further.

The paradox occurs when a member of Group A and a member of Group B attempt to share a container.

In this instance, the container will be alternately tilted or passed, causing dissatisfaction, confusion, and lamentation for both participants.

The Dilemma

The paradox is readily observed and thereby quickly resolved in the case of the almost-empty container.

But consider the case of a freshly opened cylindrical container of compressed reconstituted salty potato snacks.

Here we have a much more subtle problem of etiquette :

Group A members have the expectation that the container will be offered at an angle.
Group B members have the expectation that the container will be passed to them.

When the container is full, both the member from Group A and the member from Group B are able to obtain snacks using either technique. And yet each will be subtly offended by the other (being alternately greedy or lazy with each exchange), with neither knowing why the other is indulging in such bizarre counter-intuitive behavior in such a trivial matter.

Indeed, even in principle, it is not possible for such a mixed group to discover the cause for the dissatisfaction until some fraction of the salty snacks have been consumed, the first observable behavioral differences arise, and the damage has already been done.

The Designer

I see this happening all the time in Game Design. A designer will come up with a system based on their personal approach to gaming. They might be a Group A, or a Group B. It doesn't matter. They'll take their design to focus testing, and these preliminary results will show, unanimously and unambiguously, that the feature is working as designed. But I know differently. I can tell when I'm watching the focus test in progress. Half of the respondents will have this one particular twitch that I've come to recognize signifies something isn't quite working the way they expect, but they don't have the language to report the dissonance. And the other half will be fine.

Occasionally, when the two groups are of vastly different sizes, and the designer is a member of the minority, I see it on the faces of all of the respondents. But the feature invariably makes its way into the game. And why not? After all, it's been focus-tested and there were no problems reported. What more can a designer do?

(As an aside, if you ever try to dissuade a self-righteous designer based on a few observed facial tics, you'll learn just how quickly you can lose credibility... even when your initial observation is subsequently confirmed independently.)

The Perils of Focus Testing

Has this happened to you? A feature focus tested without problems, but gets slammed in the marketplace? Why not tell me more about it in the comments below.

Sunday, 17 June 2012

The Power Iteration

In the middle of the last post, we needed to find the largest eigenvector of a symmetric 3x3 matrix. I used the Power Iteration to find the largest eigenvector, mostly because it's such a forgiving algorithm: Even a trivial implementation will still get a correct result.

In this post, I'm going to talk about how to actually write the Power Iteration, along with a few caveats along the way.

Eigenvectors and Eigenvalues

Okay, okay, okay, so what actually is an Eigenvector/Eigenvalue?

Lets take an even further step back, what is a matrix? In mathematics, we think of a matrix as representing a Linear Transformation. Intuitively, we might think of it like this : A matrix takes an incoming vector, and forms an output vector, where each output term is just a linear combination of the input terms.

[a, b, (3, (3*a + 5*b,

c, d] ^* 5) ⁼ 3*c + 5*d)

Lets consider a really simple matrix, called Scale79:

[7, 0,

0, 9]

You can see Scale79 takes a vector like (1,0) and returns (7,0). We say that (1,0) is an eigenvector of Scale79, with an Eigenvalue (λ) of 7. Notice that (2,0) is also an eigenvector, as is (3,0), (4,0) , (-4,0) and also (7,0) and (49,0)

Scale79 also takes the vector (0,1) to (0,9). We say that (0,1) is an eigenvector of Scale79, associated with the eigenvalue (λ) 9.

Our example matrix, Scale79, has two eigenvalues, 7 and 9. Its eigenvectors associated with the eigenvalue 7 look like (1,0), and its eigenvectors associated with the eigenvalue 9, look like (0,1)

Do all (square) matrices have eigenvalues? Well, yes and no. It's true that all n-by-n square matricies have n eigenvalues, but only if we allow ourselves the luxury of eigenvalues taking on complex values. Complex valued eigenvalues can sometimes be subtle to deal with, especially when using floating point math, so lets restrict ourselves to real numbers.

In the real-number-only case, it turns out that n-by-n square matrices have n eigenvalues, iff the matrix is symmetric.

This beautiful result, and so much more (including some technical details regarding multiplicity and repeated eigenvalues) is well beyond the scope of this blog, but I urge you to read about Jordan Normal Form for an introduction to this fascinating topic if you're interested to know more.

Let look at Scale79 a little closer. We know it has two eigenvalues, 7 and 9, and it scales eigenvectors like (1,0) and (0,1) by those eigenvalues.

Scale79 * (x,0)^T = 7 * (x, 0)^T

Scale79 * (0,y)^T = 9 * (0, y)^T

In some sense, we can replace a costly operation (matrix times vector) by a much simpler operation (scalar times vector), but only for some vectors!

Eigenvectors are like a sort of super matrix compression technology, but it only works some of the time!

Singular Value Decomposition

Before we go much further, lets get something out of the way. If ever you actually want to compute eigenvalues and eigenvectors for a matrix, you just need to use the library function called "Singular Value Decomposition" or S.V.D. Any half-way decent matrix library will provide an implementation which will factorize your matrix into a length-preserving matrix (U), a matrix of (generalized) eigenvalues for scaling (S), and another length-preserving matrix (V).

M = U . S . V (via S.V.D.)

Using this decomposition, it's trivial to pick off the largest eigenvalue (it's almost always the first one!), and its associated eigenvector (it's the first row of the V matrix)

For example, in GNU octave :

octave> scale79 = [7,0; 0,9]; [u,s,v] = svd(scale79)
u =
   0   1
   1 -0

s =
   9   0         <---- Largest Eigenvalue (λ = 9)
   0   7

v =
   0   1         <---- Eigenvector associated with 9
   1   0

Update: The SV Decomposition is only equivalent to the Eigen Decomposition if the eigenvalues are positive - best to use your library's eigen solver instead...

The Power Iteration

The s.v.d. is great for if we want all of the eigenvalues and eigenvectors. But sometimes the matrix M is too big to fit in memory, or perhaps we don't have handy access to a reliable s.v.d. implementation (e.g. on a GPU or an embedded controller)

The Power Iteration works by doing the simplest possible thing - it mindlessly multiplies the matrix by a vector (any vector), repeatedly, again and again, over and over until the vector becomes very close to an actual eigenvector:

M*v

M*(M*v)

M*(M*(M*v))

M*(M*(M*(M*v)))

M*(M*(M*(M*(M*v))))

Or, in Eigenvalue terms

λ*v

λ*(λ*v)

λ*(λ*(λ*v))

λ*(λ*(λ*(λ*v)))

λ*(λ*(λ*(λ*(λ*v))))

After many iterations, the vector v will tend to become closer and closer to an eigenvector associated with the largest eigenvalue. How quickly? Well after 5 iterations, we have λ⁵*v. And that's where the power method gets its name, we are raising the eigenvectors/matrix to a really large power, say, 10000 or more and then seeing what remains.

In theory, we can form λ¹⁰⁰⁰⁰*v, and it will be just as good an eigenvector as any other. In practice, we need to deal with floating point overflow. For this reason we normalize the vector after each iteration to prevent overflow.

In code, that looks like this :

    Vector3 v(1, 1, 1);
    for(int i=0; i<10000; i++)
    {
        v = (M * v).GetUnit();
    }
    return v;

Convergence

How fast will we find the eigenvector? In the limit, it actually depends on the ratio between the largest eigenvector and the second largest eigenvector. For the matrix scale79, this ratio is 7/9 or 0.7777...

This is because the smallest eigenvalues will quickly disappear into the round-off error, and in the end, it will just be the last two eigenvectors battling for supremacy.

In code, that looks like this :

    Vector3 v(1, 1, 1);
    Vector3 lastV = v;
    for(int i=0; i<10000; i++)
    {
        v = (M * v).GetUnit();
        if(DistanceSquared(v, lastV) < 1e-16f)
        {
            break;
        }
        lastV = v;
    }

If you've been playing along at home, you'll notice I've been talking about "the largest eigenvalue". In fact, this algorithm converges to the eigenvalue with the largest magnitude. e.g. if the two eigenvalues were -7 and -9, it will converge to an eigenvector associated with -9.

So the full name for the above code snipped is really:

FindEigenvectorAssociatedWithEigenvalueWithLargestMagnitudeAssumingSquareSymmetricMatrix(m);

Ouch! That's quite a mouthful! Please don't let the terms throw you off. All of this stuff is built on simple concepts that have deep roots in physics, chemistry, economics, etc etc. Because these are still relatively newish concepts, historically speaking, it makes sense that they don't (yet) map well to traditional natural languages such as English. Trust me, it's totally worth putting in the time to get an intuitive feel for these concepts because they come up again and again in so many different contexts.

Ask questions, break it down, ignore the clumsy verbiage and focus on the concepts, and it'll all make sense.

Preconditioning

Preconditioning is a technique we use when we want to make math problems easier to solve. We say, "rather than solve this difficult problem, lets solve this easier problem that has the same answer."

In our case, if iterating M*v will eventually converge to an eigenvector, then (M*M)*v will converge twice as fast to the same eigenvector (but to a different eigenvalue - the square of the original one!)

    float scale = FindLargestEntry(m);
    Matrix33 matrixConditioned = m / scale;
    matrixConditioned = matrixConditioned * matrixConditioned;
    matrixConditioned = matrixConditioned * matrixConditioned;
    matrixConditioned = matrixConditioned * matrixConditioned;

We have to be careful that the matrix multiply itself doesn't overflow! By dividing through by the largest entry, we have good confidence we can raise the matrix to the 8th power and still keep the structure of the matrix.

At this point you might be tempted to skip the normalization step, and divide through by the determinant (or some multiplier) instead. Be careful - these two techniques, while having the same effect, are actually solving two different problems. If you try and solve both at once you may find you solve neither. In particular, if your matrix has a particularly fiendish condition called "mixing of the modes", it can wreak all sorts of havoc with your computation.

Putting it All Together

So now we have all the pieces for our Power Iteration. The primary motivation for this algorithm is ease-of-coding, so keep it simple - if you find yourself putting time into making this algorithm better, you're almost always better off using a library implementation of SVD instead.

And remember, this function only works well for symmetric matrices!

// From http://missingbytes.blogspot.com/2012/06/power-iteration.html
// note: This function will perform badly if the largest eigenvalue is complex
Vector3 FindEigenVectorAssociatedWithLargestEigenValue(const Matrix33 &m){
    //pre-condition
    float scale=FindLargestEntry(m);
    Matrix33 mc=m*(1.0f/scale);
    mc=mc*mc;
    mc=mc*mc;
    mc=mc*mc;
    Vector3 v(1,1,1);
    Vector3 lastV=v;
    for(int i=0;i<100;i++){
        v=(mc*v).GetUnit();
        if(DistanceSquared(v,lastV)<1e-16f){
            break;
        }
        lastV=v;
    }
    return v;
}

Saturday, 9 June 2012

Fitting a plane to a point cloud

A buddy of mine is trying to find the plane that best describes a cloud of points, and, naturally, my very first thought is, "Wow... that would make an awesome blogpost!"

The Problem

Given a collection of points in 3D space, we're trying to find the plane that is the closest to those points.

minimize( sum ( distance²(point, plane), point), plane)

If you're anything like me, you're probably wondering why we sum the squares of the distances.

The reasoning is a little backwards, but basically it's so that we can solve it by using the method of Linear Least Squares. It turns our approximation problem into a so-called "Quadratic-Form", which we know from theory we can solve exactly and efficiently using linear algebra.

There are other ways we could have phrased the problem. One of my favorites involves using Robust Statistics to ignore outliers. Another involves using radial basis functions to try and trap the plane within a region - this works great when we're not sure what type of function best approximates the points. Yet another approach involves perturbing and clustering the points to re-weight the problem.

Something I find fascinating is that most of those methods eventually require a linear least squares solver.

[Update 2015-11: I have a suspicion that finding the 'maximum likelihood estimator' for this particular problem doesn't require the use of LLS - can any kind reader confirm or deny?]

So lets start by forming the linear least squares approximation, and then later we can see if we still need to extend it. (Of course, if you'd like to know more, why not leave a comment in the comments section down below?)

Linear Least Squares

Let's start by defining our plane in implicit form:

C = Center of Plane

N = Plane's normal

C + X.N = 0

Then for an arbitrary point P, we can write it in 'plane' co-ordinates like so :

P = C + μN + pN^⟂

Here μ is the distance from the point to the plane, and N^⟂ is a 2-by-3 matrix representing the perpendicular to the plane's normal, and p is a 2-vector of co-factors.

We are trying to minimize the following :

E = sum( μ², points)

With a little math, we can show that

C = sum ( P, points ) / count ( points )

With a lot more math, we can show that N is the eigenvector associated with the smallest eigenvalue of the following matrix :

M = sum [ (P_x-C_x).(P_x-C_x), (P_x-C_x).(P_y-C_y), (P_x-C_x).(P_z-C_z),

(P_y-C_y).(P_x-C_x), (P_y-C_y).(P_y-C_y), (P_y-C_y).(P_z-C_z),

(P_z-C_z).(P_x-C_x), (P_z-C_z).(P_y-C_y), (P_z-C_z).(P_z-C_z)]

Finding the Center

Lets find the C first, that's the Center of the plane. In the mathematical theory, it doesn't really make sense to talk about the center of an infinite plane - any point on the plane, and any multiple of the plane's normal describe the same plane. Indeed, it's tantalizingly seductive to describe a plane by it's normal, N, and its distance to the origin, C.N.

But for those of us fortunate enough to be trapped inside a computer, we must face the realities of floating point numbers and round off error. For this reason, I always try to represent a plane using a Center and a Normal, until I've exhausted all other available optimizations.

 def FindCenter(points):

    sum = Vector3(0, 0, 0) 

    for p in points:

       sum += p

    return sum / len(points)

Finding the Normal

Now lets find the normal. You'll see this type of computation fairly often when dealing with quadratic forms, lots of square terms on the diagonal, and lots of cross terms off the diagonal.

As it's a symmetric matrix, we only need to compute half of the off-diagonal terms : X.{X,Y,Z} Y.{Y,Z} Z.{Z}

 def FindNormal(points):

    sumxx = sumxy = sumxz = 0

    sumyy = sumyz = 0

    sumzz = 0

    center = FindCenter(points) 

    for p in Points:

        dx = p.X - center.X

        dy = p.Y - center.Y

        dz = p.Z - center.Z

        sumxx += dx*dx

        sumxy += dx*dy

        sumxz += dx*dz

        sumyy += dy*dy

        sumyz += dy*dz

        sumzz += dz*dz

    symmetricM = Matrix33( \

           sumxx,sumxy,sumxz, \

           sumxy,sumyy,sumyz, \

           sumxz,sumyz,sumzz)

    return FindSmallestMumbleMumbleVector(symmetricM)

Note that we've had to make two passes through the points collection of points - once to find the center, and a second pass to form the matrix. In theory, we can compute both at the same time using a single pass through the points collection. In practice, the round-off error will obliterate any precision in our result. Still, it's useful to know it's possible if you're stuck on a geometry shader and only have one chance to see the incoming vertices.

The smallest Eigenvalue

So now we just need to find an eigenvector associated with the smallest eigenvalue of a matrix.

If you recall, for a (square) matrix M, an eigenvalue, λ, and an eigenvector, v are given by:

Mv = λv

That is, the matrix M, operating on the vector v, simply scales the vector by λ.

Our 3x3 symmetric matrix will have 1, 2 or 3 (real) eigenvalues ^{(see appendix!)}, and we need to find the smallest one. But watch this:

M^-1v = λ^-1v

We just turned our smallest eigenvalue of M into the largest eigenvalue of M^-1!

In code :

def FindEigenvectorAssociatedWithSmallestEigenvalue(m):

   det = m.GetDeterminant()

   if det == 0:

       return m.GetNullSpace()

   mInverse = m.GetInverse() 

   return FindEigenvectorAssociatedWithLargestEigenvalue(mInverse)

The largest Eigenvalue

Computing eigenvalues exactly can be a little tricky to code correctly, but luckily forming an approximation to the largest eigenvalue is really easy:

def FindEigenvectorAssociatedWithLargestEigenvalue(m):

   v = Vector(1, 1, 1) 

   for _ in xrange(10000):

       v = (m * v).GetNormalized()

   return v

Sucesss!

So there we have it, linear least squares in two passes! All that's left to do is write the code!

Appendix - Linear Least Squares plane for a Point Cloud in C++

The following 3 functions for linear least squares are hereby licensed under CC0.

// From http://missingbytes.blogspot.com/2012/06/fitting-plane-to-point-cloud.html
float FindLargestEntry(const Matrix33 &m){
    float result=0.0f;
    for(int i=0;i<3;i++){
        for(int j=0;j<3;j++){
            float entry=fabs(m.GetElement(i,j));
            result=std::max(entry,result);
        }
    }
    return result;
}

// From http://missingbytes.blogspot.com/2012/06/fitting-plane-to-point-cloud.html
// note: This function will perform badly if the largest eigenvalue is complex
Vector3 FindEigenVectorAssociatedWithLargestEigenValue(const Matrix33 &m){
    //pre-condition
    float scale=FindLargestEntry(m);
    Matrix33 mc=m*(1.0f/scale);
    mc=mc*mc;
    mc=mc*mc;
    mc=mc*mc;
    Vector3 v(1,1,1);
    Vector3 lastV=v;
    for(int i=0;i<100;i++){
        v=(mc*v).GetUnit();
        if(DistanceSquared(v,lastV)<1e-16f){
            break;
        }
        lastV=v;
    }
    return v;
}

// From http://missingbytes.blogspot.com/2012/06/fitting-plane-to-point-cloud.html
void FindLLSQPlane(Vector3 *points,int count,Vector3 *destCenter,Vector3 *destNormal){
    assert(count>0);
    Vector3 sum(0,0,0);
    for(int i=0;i<count;i++){
        sum+=points[i];
    }
    Vector3 center=sum*(1.0f/count);
    if(destCenter){
        *destCenter=center;
    }
    if(!destNormal){
        return;
    }
    float sumXX=0.0f,sumXY=0.0f,sumXZ=0.0f;
    float sumYY=0.0f,sumYZ=0.0f;
    float sumZZ=0.0f;
    for(int i=0;i<count;i++){
        float diffX=points[i].X-center.X;
        float diffY=points[i].Y-center.Y;
        float diffZ=points[i].Z-center.Z;
        sumXX+=diffX*diffX;
        sumXY+=diffX*diffY;
        sumXZ+=diffX*diffZ;
        sumYY+=diffY*diffY;
        sumYZ+=diffY*diffZ;
        sumZZ+=diffZ*diffZ;
    }
    Matrix33 m(sumXX,sumXY,sumXZ,\
        sumXY,sumYY,sumYZ,\
        sumXZ,sumYZ,sumZZ);

    float det=m.GetDeterminant();
    if(det==0.0f){
        m.GetNullSpace(destNormal,NULL,NULL);
        return;
    }
    Matrix33 mInverse=m.GetInverse();
    *destNormal=FindEigenVectorAssociatedWithLargestEigenValue(mInverse);
}

Appendix - Citation Needed

Our 3x3 symmetric matrix is Hermitian, therefore all it's eigenvalues are real and it's Jordan Normal form looks like this:

[ λ₁, 0, 0,

0, λ₂, 0,

0, 0, λ₃ ]

Assume w.l.o.g. that λ₁ >= λ₂ >= λ₃.

If λ₁ == λ₂ == λ₃, then we have only one eigenvalue.

If λ₁ > λ₂ > λ₃, then we have three eigenvalues.

The only remaining cases both have two eigenvalues.

Saturday, 2 June 2012

The Cocktail Party Effect (Part 2 of 2)

Late at night, when the boys are (finally) asleep, we like to be able to hear the voices of the people on the television without waking up the neighbors. In part one I described how to build a custom digital audio filter by specifying a frequency response and running an optimizer to determine the optimal coefficients.

This post looks at one of the actual filters I use on my television in more detail.

First up, here's the response curve for the filter.

White = Filter, Blue=Desired Response

You can see the extremely strong cut-off starting at 1000Hz, dropping very quickly, -10 dB at 300Hz, and -20dB at 50Hz. In the other direction, we have an almost flat response between 1000Hz and 5000Hz, and then a very gradual drop, -3dB at 10kHz.

This corresponds nicely with the human voice, which ranges between 800Hz - 8000Hz, with the majority of the sound energy between 1500Hz-4000Hz

Here's the actual code I use to set the desired frequency:

    float GetDesiredResponse(float freq)

    {

        float logFreq = log(freq);

        float cutOff0 = log(1000.0f);

        float cutOff1 = log(3400.0f);

        float cutOff2 = log(8000.0f);

        float result = 1.0f;

        if(logFreq  < cutOff0)

        {

            float factor = logFreq / cutOff0;

            result *= pow(factor, 10.0f);

        }

        if(logFreq > cutOff1)

        {

            float factor = 2.0f - logFreq / cutOff1;

            result *= pow(factor, 2.0f);

        }

        if(logFreq > cutOff2)

        {
            float factor = 2.0f - logFreq / cutOff2;
            result *= pow(factor, 2.0f);
        }
        return result;
    }

As you can see with this method, it's relatively easy to get precise control over the frequency response.

A couple quick notes:

It's important that the frequency response be a quasiconcave function. This ensures there are no kinks in the response, which will (1) cause visiting audiophiles to complain, and (2) make some voices more difficult to comprehend.

Be careful when specifying a very steep transition, or trying to completely stop-pass some frequencies. A digital filter has some pretty strict limits on the kinds of things it can filter. Like the proverbial genie, if you try and go there, the optimizer will give you exactly what you ask for.

I can't hear past ~17kHz, so I took extra care to test with high frequencies, but played back at half speed to make sure I wasn't torturing the cats and dogs.

This is also a good time to read up about Odd and Even functions, which correspond to odd/even numbers of coefficients in the polynomial. An Odd function, for example, will start at -∞ and rise to +∞. That's useful if you want a high-pass or low-pass filter, but bad for a band-pass filter.

Having problems with the y racing off to infinity? Your Y polynomial may have an unstable feedback loop. Try a longer or shorter filter, or add an A-stability criteria to your optimizer.

Weighting

So you're probably wondering why the filter matches the desired function so accurately over the vocal range, but seems to drift in other areas. Here's the weighting function I use to compute the error:

That is, I sample 129 frequencies logarithmically spaced between 9Hz to 18000Hz. For each frequency, I add the square of the L2 error, weighted by the square of (desired response + a quarter).

Why so much squaring? To help the function minimizer converge to a global minimum, it's nice to give big parabolas for the optimizer to slide inside.

Oh, and before I forget, here's the actual filter in C++, licensed under CC0:

    // From http://missingbytes.blogspot.com/2012/06/cocktail-party-effect-part-2-of-2.html
    static float x[5] = {0};
    for(int i=0; i<length; i++)
    {
        float sampleLeft = GetNextLeftSample();
        float sampleRight = GetNextRightSample();

        for(int j=4; j>0; j--)
        {
            x[j] = x[j-1];
        }
        x[0] = sampleLeft + sampleRight;

        // The filter!
        static float y1=0.0f, y2=0.0f, y3=0.0f, y4=0.0f;
        f32 y0=+0.588746*x[0]-0.492725*x[1]-0.757061*x[2]+0.661062*x[3]
            +1.242758*y1+0.545942*y2-1.044236*y3+0.238844*y4;
        y4=y3;y3=y2;y2=y1;y1=y0;

        float value = y0*volume;
        float outValue = bound(value, -32768, 32767);
        if(value != outValue){clippingCount++;}
        unsigned short emit = (unsigned short)outValue;
        *dest++ = emit;// Output left
        *dest++ = emit;// Output right
    }

Results

So what does it sound like? Well, for music, pretty bad actually :) But that's not the point! It's designed for voice, and that where it really shines - even though the voice quality sounds a little unnatural, you can understand what people are saying, even at very low volume levels.

(Oops, having some video encoding problems - updated video coming real soon now)

The Future

Can we do better than this? Yes, we can! When I finally get around to setting up a 7.1 audio system, I plan to measure the room response function using a calibrated microphone. Then it's a simple matter of taking the FFT of the incoming audio, dividing through by the room response, then taking the inverse fourier transform before sending to the sound card. Normally this would entail some delay/lag, but that's not a problem when playing buffered video. I could even have different responses for different listener locations, or dynamically respond to changes in temperature or humidity.

... watch this space for a future update.

Saturday, 26 May 2012

The Cocktail Party Effect (Part 1 of 2)

Late at night, when the boys are (finally) asleep, we sometimes like to watch stuff on the television. If the TV is too loud, the boys wake up, and that really breaks immersion. If the TV is too quiet, we can't hear the gripping dialogue, and that breaks immersion too.

Here's how I fixed it in code.

Mono

Okay. Don't judge me, but the first thing I did is to convert the incoming stereo audio down to mono. Most music tends to be well spatialised, while the voice track comes through the center channel. By converting to mono I estimate a ~3dB drop in perceived volume during musical interludes, or, conversely, I can raise the total volume by 3dB without waking up the boys.

Bass

The next thing to do is to cut the low frequencies. The rumbles and the explosions. The ones which reverberate throughout the house and wake up the kids and the neighbors too.

It's relatively easy to find info on constructing a digital high-pass filter on the internet, provided you can cut through the jargon. Generally there tends to be an incoming signal, denoted as x_n. It's the series of samples coming from your video decoder. And then there's an outgoing signal, y_n. It's the series of samples that you send to the sound card.

Then there's some function that links them together. Here's an example of a really simple high-pass filter you can find on the internet :

y_n= 0.2 . x_n + 0.8 . y_n-1

It's a simple matter to turn this into code:

A simple high-pass filter, in C++.

Z-transform

To work out the frequency response, we can use the Z-transform to find the Transfer Function, which maps our filter from the discrete, time domain, into the continuous, frequency domain. In our case:

H(z) = Y(z)/X(z) = 0.2 / (1 + 0.8 . z^-1)

The cool thing is we can use theory to treat z like a complex number, even though in practice, all the x_n and y_n will be floating point numbers. For example, to compute the frequency response at 1000Hz, with a sample rate of 48kHz, we take:

i = √-1

FrequencyResponse(1000Hz) = | H(2π1000/48000 i) |

= | 0.2/(1 + 0.8 / (2π1000i/48000) ) |

= 0.008

If we draw this on a graph, with frequency (Hz) on the X-axis, and magnitude (dB) on the Y-axis, we can get a "Bode Plot" of our filter:

A Bode Plot for a simple filter.

So that's a great start, but now we want more control over which frequencies pass through the filter, and which are blocked, and by how much.

Optimize

Back in the old days, we would have had to endlessly try different filter combinations and painstakingly compute Bode plots to find combinations of filters that might fit our requirements.

But now that we all have super-computers under our desks, we've got much more powerful tools to solve an old problem in a new way.

In pseudo-code, it looks like this :

def GetDesiredFrequencyResponse(frequency):

    // Your desired EQ function goes here

def EvaluateFilter(filter):

    errorSum = 0

    for frequency in 20 .. 22000:

        freq1 = GetDesiredFrequencyResponse(frequency)

        freq2 = filter.CalcFrequencyResponse(frequency)

        errorTerm = freq1 - freq2

        errorSum += errorTerm * errorTerm
    return errorSum

bestFilter = Minimize(FilterFactory, EvaluateFilter)

That's right, lets just use an offline optimizer to compute the optimal co-efficients for us! You could use numpy, or octave for this.

... And Generate the Code

'f32' is a 32-bit float

Now we have our filter coefficients, we need to inject it back into C++.

Here's a function which writes out the C++ directly. We can just copy and paste directly into ProcessAudio(), hit recompile, and hear the results immediately.

Notice the coding style - it's rife with buffer overflows, and makes huge unjustified assumptions about the inputs. You certainly couldn't use this code in a production environment, but as the scaffolding to bootstrap a single-use filter, the iteration speed trumps all other considerations.

Sample

Here's a sample output :

    static float y1 = 0.0f, y2 = 0.0f;
    float sum = 0.211989*x[0]
        -6.794531*x[-1]
        -29.803024*x[-2]
        +0.535980*x[-3]
        +0.215555*x[-4];
    float y0 = (sum -9.433180*y1 +31.500277*y2) / 0.000017;
    y2 = y1;
    y1 = y0;

Coming Soon

So this is part 1 of 2, in the next post I'll go into the details of the actual filters I'm using on my television, as well as some sample video so you can hear before and after. I'll try and include tips for how to ensure convergence, and some gotchas that the Bode plot won't tell you about. And of course, if there's anything you'd like to know more, why not suggest it in the comments below?