The particular filter happened to be everybody's favorite, the 2 Dimensional Gaussian blur.
That's the filter that turns this :
|Time and Date (in French), before..|
|...and after, all blurred and ready for compositing.|
Here's the Gaussian we're using, nothing fancy :
kernel(x,y) = exp( -x² + -y²)
And that same kernel in 1 dimension, looks like this :
|1-dimensional Gaussian : exp(-x²)|
Straight ApproachOur straight forward approach would be to first filter in X, and then a second filter in Y. Our sequence would look like this :
Well this is all well and good, but we'd end up writing two copies of the convolution code, each with different boundary conditions and memory access patterns. That often leads to copy'n'paste-style duplicated code, and that can often lead to subtle bugs and swearing.
Sideways ApproachHere's the better way, as we filter, lets also rotate the bitmap 90°.
That is, we apply a filter that blurs in X, but also swaps the X and Y axis in the process. Herein lies the magic, if we apply this filter twice, our original X-axis and Y-axis are restored, while our separable filter is applied to each axis in turn.
Here's the code if you're curious :
|Yep, it really is that simple.|
It's smaller code too, so less footprint in the instruction cache, and improved branch prediction, all good stuff for today's multi-threaded apps.
ShadersIf you're GPU bound, this works great for shaders too. The performance of the actual filter will be the same, but you're running the same shader twice, so there's no expensive shader switching, especially if you're applying the same filter multiple times.
Of course, the big win here is removing duplication, so your shader will be more maintainable, easier to verify it's working correctly, and more robust if you use this technique.
Not just for Gaussian BlurOf course, this technique works for any separable filter. I also use rotate 90° in my high-quality bitmap scaling, and it works great - about 20% faster than the straight-forward approach, and much simpler code.
When profiling, one thing I found was slightly better cache performance by reading rows, and then writing columns. As always, profile on your own data to see what works best for you.
ResultsHere's the final composite. You can see the same effect applied to the pause ("paws") emblem in the top right.
|Bonus point if you can recognize the music video!|
Now, I'm really sorry everyone, but the number of comments on this blog has been completely overwhelming and I simply can't keep up. So even though the comments box below is working properly, I absolutely forbid you to post any comments, or discuss this post or blog in any way.