Saturday, 31 March 2012

Shorting the output to the input

I don't always have access to an integrated debugger, so sometimes I need to get down and dirty with the logs to figure out what's not working correctly.

I'd often end up staring at pages and pages of, er.. umm, well.. stuff like this when debugging Vector3D :

    x=0.3,y=0.7,z=3.9
    x=0.7,y=1.7,z=0.2
    x=2.1,y=0.2,z=0.2
    x=4.3,y=0.1,z=4.1


Along the way I picked up a few techniques to accelerate the process.  But first, I'll let the code speak for itself:


Lets take the constructor example.  When you use this style (the default), you would get the following in your output window:

    Vector3D vector3D(0.3f,0.7f,3.9f);
    Vector3D vector3D(0.7f,1.7f,0.2f);
    ...


We can immediately copy and paste this from the log/output window straight back into the code window, hit recompile, and recover a dormant state from the past, ready for closer examination in the debugger.

The excel format is handy too, just copy and paste the output into excel, select any two columns, and hold the graph button. (scatter-XY is my favorite)  You now have a visual representation of your vectors in less than 2 seconds.

More generally, this falls under the topic of code-which-generates-output-for-the-input-of-another-program.  In this case, the c++ compiler itself.  I'll have a lot more to talk about this topic in the posts to come.

But this example does raise one interesting special case, that of generating output, which is then fed back in as input to the same generating program.  As we all know from control theory, we need a little extra work to avoid runaway feedback in this case!

Do you connect your output to your input?  Let me know in the comments below!

Shell Game

For the longest time, I thought there were only two (good) ways to exchange the value of two variables.

Use this form when 'a' and 'b' are likely to be in memory.

Use this form when 'a' and 'b' are likely to be in registers.

The first form uses the classic load-store architecture.  We load the variables from the cache, we do some work, we write the variables back into the cache. Great.

The second form is for when you are register starved.  We're in a tight loop, and trying to squeeze as much into the instruction cache.  Let's use the ALU to do all the work for us. Again, Great!

But it turns out that for all that time, I was completely wrong.  The right way to exchange variables is actually this :
Lets optimize for developer time.
Why? Because the compiler knows if the variables are in registers or in memory, and will automatically choose the best form of the function.  Almost as an added bonus, we get type checking and polymorphism for free.

(Are you still doing it the one of other ways, I'd love to hear from you in the comments!)

But can we do better even than std::swap?  Why yes, it turns out we can :

Wait, that's not C++ !
Over here in Python land, things are much more expressive.  We can actually assign a tuple to a tuple. You wouldn't want to do this in C++ because it would involve making temporaries and at least 4 copies. But in python, tuples are immutable, and by reference, so all this has to do is change 'a' to reference 'b', and vice versa, regardless of the size of 'a' and 'b'.

Great! So what else can we do?


We're perhaps bordering into stunt programming now, but consider, how would you rewrite these two python statements in C++?

Maybe it's time for us to reconsider what it is we're optimizing...


Eating the Comma

A valid JSON fragment.




We all love JSON (Java Script Object Notation) for serialization.

It just feels so natural, you’ve got numbers and strings, all wrapped up in arrays and objects.  And that’s it!

I've found myself writing tiny data exporters quite a bit lately, and wanted to share a technique I like to call “Eating the Comma”.

Take a look at the JSON on the right.  You can immediately see that it's well-formed.  But now take a closer look at the commas.  See how we can’t tell until the next line if we need a comma on this line?

If everything is in memory, it's not a problem, we know when we get to the end of the array, or the last field in an object.

But what if we're generating the data on fly?

Or if there are gaps in the data (like the "vendor" string) where we need to look ahead in the output stream to tell if we've reached the end?

Bad JSON - too many commas!






Wouldn’t it be easier to program if we could output this “Bad JSON” instead?










So how can we be lazy when we write the output, but still write out valid JSON? Something like this:


We’ll go ahead and write out the commas, but then delete them again when we close the array or object.

Here's how that might look in C++ :

(click to zoom)

Well, this is my first blog post, so please use the comments below and let me know what you think.