Saturday, 28 April 2012

Half-Open Squares

Allow me to re-introduce our friends, the half-open intervals :

The half-open interval,  [a,b)

They're the set of all x, such that ax and x < b.

We use them all the time.  Here's one :
An iterator in C++, over a half-open interval.

Open and Closed

What does it mean for a set to be open? What does it mean for it to be closed?  And half-open? What's up with that?

I'll save the formal definition for another time, but intuitively, we can think of an open set as one where every member has a little bit of wiggle room.  No matter how closely you zoom in to the edge of the set, there's always a little bit more before you get to the edge :
The open interval,  (a,b)

And a closed set? That's the set where if you start from the outside side of the set, there's always a little bit of wiggle room before you get to the edge. No matter how close to the edge of the set you are, there's always just a little bit of gap before you can get inside.

A set is closed iff its complement is open.
So half-open?  What's that?  Given the strange definitions for open and closed, it should come as no surprise we have another strange definition : A half-open set is one that has one side closed, and one side open!
Half open: one side closed, one side open.

Splitters and Joiners

Half open intervals are great for joining.  Suppose we have two half-open intervals, [a,b) and [b,c).  Then their union is just [a,c) .

Further more, if we know that if x is in [a,c), then it must be in either [a,b) or [b,c).

You'll see this all the time when we write recursive algorithms on containers with iterators, we use an iterator to split the container into two smaller containers, which each member is in one, and only one, of the children.


And what of the half-open rectangles?
Here's one:

1 ≤ x < 3 and  1 ≤ y < 3
If we use half-open rectangles, we get all of the same benefits as the 1 dimensional case.  We can easily form the union and intersection.  The empty rectangle has a simple representation.  We can subdivide, etc, etc

Here's what it might look like in code :
A Half-Open rectangle class.

Still not convinced?  Consider an 800 x 600 bitmap, whose pixels are [0,800) x [0,600)

for ( int y = 0; y < height; y++ )
    for ( int x = 0; x < width; x++ )
        PlotPixel ( x, y, Color );

Or this unbound function:

  Rectangle Intersect ( const Rectangle &a, const Rectangle &b )
      return Rectangle (
          max ( a.X0, b.X0 ), max ( a.Y0, b.Y0 ),
          min ( a.X1, b.X1 ), min ( a.Y1, b.Y1 ) );

Next time you make an interval, rectangle, or cube class, why not try using the half-open convention?  You'll almost certainly arrive at a smaller, more robust code.

Saturday, 21 April 2012

Quadratic Formula

We all the remember the quadratic formula from high school,  x = ( -b ± √ b² - 4ac ) / 2a, but if it's so familiar to us all of us, why do I keep coming across such bad implementations in code?


Lets start with a slightly more general problem, we're interested in finding all the real roots of these polynomials in x :

k0  +  k1 x  =  0

k0  +  k1 x  +  k2 x2  =  0

k0  +  k1 x  +  k2 x2  +  k3 x3  =  0

k0  +  k1 x  +  k2 x2  +  k3 x3  +  k4 x4   =  0

k0  +  k1 x  +  k2 x2  +  k3 x3  +  k4 x4  +  k5 x5  =  0

We know from the Fundamental Theorem Of Algebra that, for example, a quintic will have at most 5 real roots, so lets go ahead and write out the function prototypes :

(We also know from Galois Theory that quintics might require a numerical approximation, but that's a post for another day!)

The Quadratic Formula, Revisited

descriminant = kLinear² - 4*kQuadratic*kConstant

x = ( -kLinear ± √ descriminant ) / 2 / kQuadratic

So the first case that I hardly ever see handled, is when kQuadratic vanishes, leading to a divide-by-zero exception.  Actually, this case is very easy to handle if we have a family of functions:

Another important case is when kConstant is zero.  We can see graphically that the polynomial goes through the origin :
kConstant is zero means that 0.0 is a root.

Here's the code:
Note the sneaky 1+ and +1

Catastrophic Cancellation

The last problem we need to fix is when the square root of the descriminant is close in magnitude to kLinear.  Graphically :
This happens surprisingly often in physics calculations.
To fix this, consider the product of the roots :
rootProduct = root[0] * root[1]
rootProduct =  (-kLinear + √descriminant)/2/kQuadratic *
(-kLinear - √descriminant)/2/kQuadratic
rootProduct = ( -kLinear + √ descriminant ) * ( -kLinear - √ descriminant ) / 4 / kQuadratic²

Now something cool happens! Remember that (x+y) * (x-y) = x² - y²

rootProduct = ( (-kLinear)² - √ descriminant² ) / 4 / kQuadratic²

rootProduct = ( kLinear² - descriminant ) / 4 / kQuadratic²

rootProduct = ( kLinear² - kLinear² + 4*kQuadratic*kConstant) / 4 / kQuadratic²

rootProduct = ( kQuadratic*kConstant) / kQuadratic²

rootProduct = kConstant / kQuadratic

What this means is that we can use the larger root to calculate the smaller root by dividing by their product, and keep all the accuracy.

Appendix - Quadratic formula in C++

If you're just here for the code, the following two functions are hereby licensed under CC0

#include <math.h>
#define EPSILON (1e-10)

// From
int SolveLinear(double rootDest[1],double kConstant,double kLinear){
        return 0;
    return 1;

// From
int SolveQuadratic(double rootDest[2],double kConstant,double kLinear,double kQuadratic){
        return SolveLinear(rootDest,kConstant,kLinear);
        return 1+SolveLinear(rootDest+1,kLinear,kQuadratic);
    double descriminant=kLinear*kLinear-4*kQuadratic*kConstant;
        return 0;
    if(descriminant==0){// no EPSILON check!!
        return 1;
    double adder=sqrt(descriminant);
    double rootLarge=(-kLinear+adder)/(2*kQuadratic);
    return 2;

Even More Math

They tell me that each mathematical formula in your blog post exactly halves your potential readership.

  How does that make you feel?

Why not talk about your feelings in the comments below?

Saturday, 14 April 2012

Blur 90°

So I was refactoring recently, and came across this little gem.  We all know that separable filters can be computed in two passes, one pass in X, and a second pass in Y.  Naturally, that's actually what makes them separable.

The particular filter happened to be everybody's favorite, the 2 Dimensional Gaussian blur.

That's the filter that turns this :
Time and Date (in French), before..
Into this :
...and after, all blurred and ready for compositing.

Here's the Gaussian we're using, nothing fancy :

kernel(x,y) = exp( -x² + -y²)

And that same kernel in 1 dimension, looks like this :
1-dimensional Gaussian : exp(-x²)
This will be the filter we actually apply.


Straight Approach

Our straight forward approach would be to first filter in X, and then a second filter in Y.  Our sequence would look like this :

Well this is all well and good, but we'd end up writing two copies of the convolution code, each with different boundary conditions and memory access patterns.  That often leads to copy'n'paste-style duplicated code, and that can often lead to subtle bugs and swearing.

Sideways Approach

Here's the better way, as we filter, lets also rotate the bitmap 90°.

That is, we apply a filter that blurs in X, but also swaps the X and Y axis in the process.  Herein lies the magic, if we apply this filter twice, our original X-axis and Y-axis are restored, while our separable filter is applied to each axis in turn.

Here's the code if you're curious :

Yep, it really is that simple.
There's only one version of the convolution, so we can really put some good effort into checking our boundaries.

It's smaller code too, so less footprint in the instruction cache, and improved branch prediction, all good stuff for today's multi-threaded apps.


If you're GPU bound, this works great for shaders too.  The performance of the actual filter will be the same, but you're running the same shader twice, so there's no expensive shader switching, especially if you're applying the same filter multiple times.

Of course, the big win here is removing duplication, so your shader will be more maintainable, easier to verify it's working correctly, and more robust if you use this technique.

Not just for Gaussian Blur

Of course, this technique works for any separable filter.  I also use rotate 90° in my high-quality bitmap scaling, and it works great - about 20% faster than the straight-forward approach, and much simpler code.

When profiling, one thing I found was slightly better cache performance by reading rows, and then writing columns. As always, profile on your own data to see what works best for you.


Here's the final composite.  You can see the same effect applied to the pause ("paws") emblem in the top right. 

Bonus point if you can recognize the music video!


Now, I'm really sorry everyone, but the number of comments on this blog has been completely overwhelming and I simply can't keep up.  So even though the comments box below is working properly, I absolutely forbid you to post any comments, or discuss this post or blog in any way.

My apologies.

Saturday, 7 April 2012


People sometimes ask me, "How do you write a factorial function in C++?"

Now there's some mighty fine theoretical answers to this question that you can learn in CompSci101, and those answers may very well be useful in your next job interview.  But what if you're trying to compute the relative strength of poker hands, or calculate odds in state lotteries?  Or perhaps you're interested in the probabilities of encountering a hash collision?  What if what you really want is a function that returns factorials?

What if you want this function, but with less bugs:

Super fast and super accurate, iff k<=1

If you've been following along at home, you know that we can write a program to help write a factorial function for us.

Here's one I prepared earlier:

Could we write this in Python?  Probably not; we need bug-compatiblity with C++.

Here's the output:

    int factorialTable[]={1,
        1932053504,//13!  <- Uh oh!

Uh oh, 13! == 6,227,020,800.  Something isn't quite right.  This number is larger than INT_MAX == 2,147,483,647.  It's simply too big to fit in a 32 bit integer.

Lets finish the function anyway, and then see what we need to change :

Fast and accurate, iff k<=12

This is probably just fine for calculating our spherical harmonics coefficients, but it looks like we might be stuck when it comes to poker.

We could try move to 64 bit integers, but that will just slow everything down.  Lets try to move to floating point instead, and see what happens:

Great! Now we can approximate factorials all the way up to 170!

Of course, we're only using approximations to factorials now, so lets compute the error term.

Here's the actual 52!


And this is the return value from Factorial(52) when expressed as a double :


Around 13 significant figures, which is reasonable for double precision.

Now lets look at numberOfPokerHands. A quick check on wikipedia reveals that there are indeed 2,598,690 different poker hands, so it's all good.  But wait, the value of numberOfPokerHands is coming out as exactly 2,598,690.0000000...  A quickwatch in the debugger reveals there is no error term at all.

No error term at all???  How can this be??

Somehow we're getting an exact result, even though we did a whole bunch of floating-point math, and almost all our intermediate quantities had to be rounded.

The trick is that the errors in the numerator and in the denominator are correlated.  They both contain the same errors.  When we divide one by the other, all the common factors cancel out, and all the common error terms cancel out too.

If you're playing along at home, you might try to write a version of FactorialDouble where the errors don't correlate.  Hint : You might need to invoke the optimizer.




So now we can go up to 170!, the largest factorial representable in double precision floating point.

What if we want to go bigger still? Say, 365, the number of days in a year.  Or, 416, the number of passengers on a jumbo jet.

Let's use the slide-rule technique, and work with the log of our numbers, instead of the numbers themselves. In this scheme, we add the log of two numbers to multiply, and we subtract to divide.

Like this:

If you try and speed this up, be sure that you're careful to keep our special error correlation property.

Speed or Accuracy?

You can see a fairly typical progression throughout this post.  We started out with an extremely fast function, that worked correctly for a very narrow range of inputs.  As we increased the range, the function got slower and then approximate, and ultimately ended up returning a mapping of the desired output.

But even using this approximate form, we can still obtain exact results for a large range of interesting problems, by careful handling of our error terms.

When developing algorithms, we see these kind of trade-offs again and again.  I'd love to hear about some you've encountered in the comments below.

Appendix - Factorial functions in C++

If you're just looking for code, the following three "Factorial" functions are hereby licensed under CC0.

int FactorialInteger32(unsigned int k){
    int factorialTable[]={1,
    return factorialTable[k];

double FactorialDouble(unsigned int k){
        return FactorialInteger32(k);
    double result=FactorialInteger32(12);
    for(int i=13;i<=k;i++){
    return result;

double LogFactorial(unsigned int k){
        return log(FactorialDouble(k));
    double result=log(FactorialDouble(170));
    for(int i=171;i<=k;i++){
    return result;