Saturday, 14 April 2012

Blur 90°

So I was refactoring recently, and came across this little gem.  We all know that separable filters can be computed in two passes, one pass in X, and a second pass in Y.  Naturally, that's actually what makes them separable.

The particular filter happened to be everybody's favorite, the 2 Dimensional Gaussian blur.

That's the filter that turns this :
Time and Date (in French), before..
Into this :
...and after, all blurred and ready for compositing.

Here's the Gaussian we're using, nothing fancy :

kernel(x,y) = exp( -x² + -y²)

And that same kernel in 1 dimension, looks like this :
1-dimensional Gaussian : exp(-x²)
This will be the filter we actually apply.


Straight Approach

Our straight forward approach would be to first filter in X, and then a second filter in Y.  Our sequence would look like this :

Well this is all well and good, but we'd end up writing two copies of the convolution code, each with different boundary conditions and memory access patterns.  That often leads to copy'n'paste-style duplicated code, and that can often lead to subtle bugs and swearing.

Sideways Approach

Here's the better way, as we filter, lets also rotate the bitmap 90°.

That is, we apply a filter that blurs in X, but also swaps the X and Y axis in the process.  Herein lies the magic, if we apply this filter twice, our original X-axis and Y-axis are restored, while our separable filter is applied to each axis in turn.

Here's the code if you're curious :

Yep, it really is that simple.
There's only one version of the convolution, so we can really put some good effort into checking our boundaries.

It's smaller code too, so less footprint in the instruction cache, and improved branch prediction, all good stuff for today's multi-threaded apps.


If you're GPU bound, this works great for shaders too.  The performance of the actual filter will be the same, but you're running the same shader twice, so there's no expensive shader switching, especially if you're applying the same filter multiple times.

Of course, the big win here is removing duplication, so your shader will be more maintainable, easier to verify it's working correctly, and more robust if you use this technique.

Not just for Gaussian Blur

Of course, this technique works for any separable filter.  I also use rotate 90° in my high-quality bitmap scaling, and it works great - about 20% faster than the straight-forward approach, and much simpler code.

When profiling, one thing I found was slightly better cache performance by reading rows, and then writing columns. As always, profile on your own data to see what works best for you.


Here's the final composite.  You can see the same effect applied to the pause ("paws") emblem in the top right. 

Bonus point if you can recognize the music video!


Now, I'm really sorry everyone, but the number of comments on this blog has been completely overwhelming and I simply can't keep up.  So even though the comments box below is working properly, I absolutely forbid you to post any comments, or discuss this post or blog in any way.

My apologies.


  1. Cool stuff! I wrote a separated gaussian blur in GLSL for BEEP's pause screen (it blurs the background and overlays the menu on top).

    But of course I did it the naive way and wrote two shaders, one for each axis.

    At least I was smart enough to pre-calculate the gaussian sample offsets and put them into a uniform shader constant. Some of the examples I saw evaluated the offsets inside the pixel shader, which as you showed, has an expensive pow().

  2. Awesome! Any chance you could share the GLSL with the shader constants? It would make a great follow-up post!

    (And if there's anyone out there on the interwebs who hasn't yet played BEEP, surf on over to !)