Saturday, 17 January 2015

Whitespace, you're doing it wrong.

What if there was a scientific, objective way to format the whitespace in your current programming language?

Received wisdom tells us that choice of formatting styles is a purely subjective choice. The same as choosing a coffee blend, or finding the best way to paint your bikeshed.

What if they lied?

What if it was indeed possible to develop a framework to measure, compare and contrast formatting styles in a rational, objective way by applying the Scientific Methodtm?

As a community, we could then migrate towards a formatting style based on a series of measurable scientific tests, such that every competent programmer would agree on the outcome, even when that outcome differed from their own personal preference.

We might even then invoke Stigler's law and name such a formatting style “The Kolmogorov Style” for formatting programming languages.

What would you do if your favourite subjective choice turned out to be different from the Kolmogorov Style? What would you be forced to do, if I could scientifically prove that your whitespace choices, what you thought was your subjective preference, was objectively wrong?

Read on if you're genuinely brave enough to challenge your preconceptions...

TRIGGER WARNINGS : Unicode, indentation, tabs, newlines, braces, Kolmogorov complexity

Warning : This post requires unicode to read correctly.  If your browser does not support unicode, please stop reading now.


  • Note 1 : This blogpost will be continuously updated in response to reader feedback.  Because Peer Review.
  • Note 2 : Formatting conventions and other terminology follow wikipedia.
  • Note 3 : I'm using C/C++ merely for convenience here. Obviously this extends to other (programming) languages in a natural way.
  • Note 4 : It seems some brave souls are actually attempting research of this sort : http://ppig.org/sites/default/files/2015-PPIG-26th-Sarkar.pdf Way to go guys!

Existence proof

This is going to take a little bit of work, so I'm going to proceed in stages.

First up, I'm going to construct what theoretical mathematicians call an 'existence proof'. That's a standalone proof where I show it is possible to find an objective test of a formatting style, such that 100% of programmers will reach the same conclusion. The objective test won't be convincing! It's not intended to be convincing. The point of this section merely illustrates that it is possible to construct at least one objective test to make a decision between alternative formatting options.  Only once we have our existence proof, will we be able start our search for the best objective tests to use.

As a pure hypothetical for this section, suppose we're starting a completely new project, purely for our own personal enjoyment, and we're trying to decide if we should indent using the TAB character, or indent using 4xSPACE.

For our objective metric, we will measure the disk space required to store the code, with smaller being better.

We know that the TAB character ('\t') is 1 byte, and 4xSPACE, ('    ') takes up 4 bytes.

Now, here's the part where I apply the scientific method! I select a representative source file from my hard drive, and prepare two versions, one with TAB indentation, and one with 4xSPACE indentation.

This particular file happened to have 2654 tabs, so we can cross-check :

RandomCPPFileTAB.cpp : 30,430 bytes
RandomCPPFile4xSPACE.cpp : 38,392 bytes

The results of my experiment showed the 4xSPACE version of this file is 26% larger than the TAB version.

I urge you to apply the scientific method and actually perform this experiment to determine if it is repeatable!

Extrapolating from this single data point(!), we expect that source files which use 4xSPACE would, on average, take up a larger amount of disk space, somewhere in the ballpark of 20% - 30% more.

Any competent C++ programmer can follow this chain of reasoning and agree that C++ files which use 4xSPACE are never smaller than the equivalent TAB version.

Crucially, my claim is that everybody who competently performs this particular objective test will obtain this same result (TAB beats 4xSPACE), even if they personally prefer 4xSPACE over TAB.

Now, before y'all start complaining that I chose an unfair metric, let me just repeat : This is an existence proof.

The entire point of this section is to establish the following hypothesis : “There exists at least one objective test which can be used to make a decision between alternative formatting options.”

Just to be 100% crystal clear, I'm not claiming this isolated test is sufficient for anyone to change spaces to tabs or vice versa!  

The only point that I've proved is that it is possible for at least one objective test for whitespace stylistic concerns to actually exist, and therefore, by extension, it is meaningful to look for more.

(However, I feel obliged to point out that Kolmogorov Style does indeed use tabs instead of spaces for the C++ language. I certainly haven't proved it...  But I think I can illustrate the fact by issuing the following challenge: If there exists even one objective scientific test against (i.e. 100% of competent C++ programmers would complete the test and choose 4xSPACE over TAB) then please post such a test in the comments below. I'll be sure to update this post if an objective scientific test does exist.)


Another example, Find in Files, :lgrep, Find in Workspace, M-x rgrep 

Here's another motivating example, again purely for illustrative purposes.  It's not intended to be convincing, the goal is simply to illustrate that it is possible to construct objective tests where 100% of competent programmers will reach the same conclusion about a formatting style, even if their personal preference is different.

Again, pure hypothetical, entirely new project, we're trying to decide between two ways to format a C++ class declarations:

class Allman
:
public Style
{
//...
};


class Stroustrup : public Style {
//...
};


Every programming environment provides some facility for matching lines within a codebase, so our objective test will be : Which formatting style gives us more information about the class hierarchy, as a result of performing 'grep class *.h'. (i.e. Look in all files which have a .h suffix, and print out any line which contain the string 'class')


Here's what the output might looks with the Allman style:

class Adams
class Jackson
class Garbo
class Greta
class Douglas
class Michael1
class Michael2
class Jackie
class Kennedy
class Sam

And the same code, formatted with Stroustrup style:

class Adams {
class Jackson {
class Garbo {
class Greta : private Garbo {
class Douglas {
class Michael1 : public Douglas {
class Michael2 : private Jackson {
class Jackie : public Kennedy, private Onassis {
class Kennedy {
class Sam : public Adams {

On this particular objective test, I claim that every competent C++ programmer will agree that the Stroustrup style gives more information, even if they personally prefer to use the Allman style.

Once again, if you're at all interested in the scientific method and have access to a C++ codebase, please do a 'find-in-files' or similar. Are you satisfied with the results? Would you have more or less information if your codebase was Allman/Stroustrup style? Please post your results below, especially if they are different!!








We're starting to get closer.  We now have two examples of objective tests, and maybe we can even start to see a trend (remove redundant encodings).

We're also beginning to suspect there's probably many families of objective tests, and almost all of them are contradictory.

Take the famous painting the bikeshed problem, how many coats of paint should we use?

  • Clearly 2 coats are objectively better than 1, because durability!
  • Clearly 2 coats are objectively worse than 1, because cost!

Suddenly our objective tests are starting to look awfully subjective - given a universe of objective tests, how do we choose which are the best objective tests to apply?

To answer that, I'm going to need to introduce “Coding Atoms”.

Introducing Coding Atoms

Atom – indivisible
The smallest, indivisible constituent part or unit of something.
from Ancient Greek ἄτομος (átomos, “indivisible”)




Allow me to introduce a simple C++ coding atom:
int

Just like in physics, we can look inside that 'atom', and see that it has 3 component parts, 'i', 'n', and 't'.

Let's look at the 6 possible permutations of those three parts, and verify that only one of them is valid C++:

int 
itn 
nit 
nti 
tin
tni

Here's another way of probing our atom, we'll insert a pair of brackets and confirm that only when the “i”, “n” and “t” are adjacent do we have valid C++.
()int
(i)nt 
(in)t
(int)
i()nt
i(n)t
i(nt)
in()t
in(t)
int()


The component parts of “int” are fused together in a special kind of way, we can't tear them apart, or insert things, or rearrange them and still have a meaningful C++ program.

That's what I'm talking about, when I use the term “coding atom”.



Coding Atoms : Variable Declarations

Allow me to present another atom, a variable declaration using the Resource Acquisition Is Initialization (RAII) principle:

int adams = 42;

Or if we substitute {■ : adams, □ : 42}, we can rewrite our atom like this :

int ■ = □;

Let's try permuting:
int ■ = □;
int ◻ = ■;
■ = int ◻;
; ■ int = ◻
■ int; = ◻

And inserting brackets:
int ■ = (□);
(int ■) = ◻;
(int ■) = ◻;
int (■ = ◻);

In the same way that 'int' is a coding atom, this declaration is 'fused' together. We can't insert or rearrange into it and still have a meaningful C++ program.


Look what happens when we break RAII and split our atom across two lines:

int ■;
■ = □;

And permute:

■ = □;
int ■;

Or insert a pair of braces:

{
int ■;
}
■ = □;

Compiler errors for everyone!


OK, lets try two atoms, this time with comments

int ▲ = △; // triangle
int ■ = □; // square

Permuting and everything is awesome:

int ■ = □; // square
int ▲ = △; // triangle

Rock on! Now braces! All good!

int ▲ = △; // triangle
{
int ■ = □; // square
}

But what if we split the atoms and try to write our comments like this.. maybe that's okay? :

// triangle
int ▲ = △;
// square
int ■ = □;

Permuting, perhaps as a result of sorting the lines alphabetically :

// triangle
// square
int ■ = □;
int ▲ = △;

Whoa, what just happened?!?! Did you just feel the cabin pressure drop? We've produced valid C++ code that compiles just fine, passes all our unit tests, and runs correctly … but our comments are disconnected from their roots and have become actively harmful to our understanding of the code.

These are some of the longest lived families of software defects – the ones where the executable isn't affected and our automatic tools don't notify us when a new problem has been created. This only happened because we allowed a newline to be inside our coding atom.

I know what you're thinking.. “Such a silly mistake would never make it into shipping code”.  Well, anecdotally, in the short time since preparing this blogpost as a draft and actually posting it, I've located and repaired defects of exactly this type (on a live production codebase) on at least 4 occasions.

The take home for this section is : At least on this somewhat contrived example (the “reducing defects as a result of random line permutations of variable declarations” test), every single programmer, regardless of their own personal variable declaration preference, who competently performs this experiment, will agree that the best way to declare a variable is with everything on the same line. i.e.:

int ▲ = △; // triangle



Coding Atoms : Mandatory Braces


You're not going to like this one. Not one bit.

Here's the coding atom for an if statement.

if(●){
}

What happens if we don't use braces at all?

if(●)

Disaster! Our ◎ atom has been torn apart!  Observe:

#define ◎ print(“Hello,  World“); log(“Hello, Log”); 

As a consequence of not using braces, we've opened ourselves to the possibility of splitting an atom and introducing bugs.  I'll formalise this principle more fully in just a moment.  But for right now, I'm just going to blurt out:

For any given pair of C++ formatting styles which differ only in their usage of mandatory braces, the style using mandatory braces will be objectively better.


(Oh, you think I'm wrong on this one? First, go read about the gotofail bug  Once you think you've understood that, let me know why you think $X is more important that having your code run correctly.)

Anyway, lets press forward!  There's an alternate way of formatting conditionals, where the opening brace occurs on a different line from the conditional.  I'll come back to the problems with this example very shortly, but for now, I just want to identify the coding atom and move forward.

if(●)
{
}


Coding Atoms (part 3)

Here's a short list of some of the frequently encountered (C++) coding atoms:

▵;//comment
int ▵=▵;
if(▵){▵}
while(▵){▵}
for(▵;▵;▵){▵}
class ▵:public ▵{▵};
int ▵(▵,▵,▵,...)const;


Hypothesis Time


So at this point, we hopefully have an intuitive feel for what a coding atom is (you can't split it up), and we have two examples of an objective test (100% of competent programmers reach the same conclusion, regardless of their personal preference) for formatting.

Allow my to present my central hypothesis:

Central Hypothesis:
For any given pair of formatting styles,
which differ only on the relative placement between the conditional and opening brace, 
the style where they are on the same line will be objectively better.

If we could prove this hypothesis, with just a little bit of imagination, we could use it to predict that the single-line family of styles (Stroustrup, BSD KNF, Lisp style, Ratliff, K&R, 1TBS) are objectively better than the multi-line variants (Kernel style, K&R style, Allman style, Whitesmiths, GNU style, Horstmann)

Edit: An earlier version of this blog used 'Theorem' instead of 'Hypothesis' in this section.  My apologies for any confusion this may have caused.

I suspect an actual proof for this hypothesis would make use of the following two observations :

Line-based world


We live in a line-based ecosystem. Find-In-Files, line numbers, the #line directive.  In unixland, we have grep, sort, uniq etc.

In this section I'm going to use 'git' as an example of a source code control system, but the same principle applies to perforce, mercurial, subversion, cvs etc ad infinitum

Here's an actual real world diff from my local repository :

diff --git a/Source/Base/Optimize.cpp b/Source/Base/Optimize.cpp
index d827540..7cd13b5 100644
--- a/Source/Base/Optimize.cpp
+++ b/Source/Base/Optimize.cpp
@@ -258,6 +258,8 @@ void SWOptimize::FindMinimumBFGS(){
                                }
                        }
                }
-//             swlog(SWRoleMin,"iter=%d\n",iter);
+               if(Verbosity>1){
+                       swlog(SWRoleMin,"BFGS iter=%d\n",iter);
+               }
                CalcNumericJacobian(&jacobian);

There's nothing surprising about this, one line was deleted and 3 lines were added.  That's how diffs work. There's nothing special about the choice of git here either.

Diffs are one thing. What happens when we merge?

Earlier, we looked at an objective test that took of the somewhat arbitrary form “reducing defects as a result of random line permutations”.  Well it turns out, when we merge, that's exactly what is happening. We take two sets of line based changes (Adds and Removes) and we try to apply them both at the same time.

A pecking order for Merge Problems.


Suppose we're trying to merge two diffs. There's a number of outcomes that could happen, from best case to worst case, they are:

✔✔ Automatic merge succeeds, code compiles, all tests pass! everything works! Success!
Automatic merge fails, manual fix.
✔✗ Automatic merge succeeds, code doesn't compile, manual fix.
✔✗✗✗ Automatic merge succeeds, code compiles, some tests fail, lengthy manual fix.
✔✔✗✗✗✗✗✗ Automatic merge succeeds, code compiles, all tests pass! dormant bug enters the system.

That last outcome is particularly insidious.  Your tools told you everything was fine, but they secretly created a bug.
Lets take a look at our '4-line if' statement, with the conditional and opening brace on separate lines:
if(●)
{
}

You see where this is going right? That newline between the conditional and the opening brace?  That's exactly the place where our merge tool will happily go ahead and insert new code.
if(●)
!
{
!
}

This is just horrible.  The merge reported success and our code compiles just fine – yet the bug exists!  We're down to our last lines of defence now such as unit tests and peer review.  If they fail to detect the problem, then we're silently putting bugs into our codebase, purely as a consequence of our C++ formatting style.

How often does this happen? When I've been working in active codebases that use this style, I probably see this type of “successful” merge at least once or twice per week. It takes time to clean up, and with any change, there's always the chance of making mistakes and introducing more problems.

Let's put this in context. Suppose github or bitbucket randomly inserted bugs into your code, but didn't notify you. Or if your IDE would occasionally insert extra characters into the middle of your file, and save it, and not tell you about it. Even if it was just once or twice a year per user, wouldn't you try hard to find some way to fix it?

Summary

Lets review:


  • I gave an existence proof, a simple objective test that every competent programmer can perform and get the same answer, even when it conflicts with their own personal opinion.


  • I highlighted coding atoms, and demonstrated why we should try hard to stop them spanning multiple lines.

So did I do it? Describe Kolmogorov Style, a profoundly new way of formatting programming languages, in a way that everyone can objectively agree on?  Well of course not! Such a thing can't be done in just one tiny blogpost.

Instead, I managed to show that it is most likely possible to construct such a formatting style, and gave some hints in the direction I think that such a style might look like.

Keep in mind, even if such a style were to be fully enumerated and justified, there would still be many excellent reasons not to follow it. Maintaining consistency for one. Retraining for another. The existence of an objective standard does not require us to ignore subjective factors.

However, if this is you:

  • Starting a new project.
  • Preparing your team's coding standard for the first time.
  • Developing the presentation layer of a text editor.
  • Creating the syntax for a new programming language. (Especially this one!)
Then I strongly suggest you take the time to understand the principles embodied in this post :

  • Reduce redundant encoding.
  • Don't split atoms across multiple lines.
  • Convert optional syntax to mandatory.


Appendix A: Aesthetics? They matter right?

Absolutely! Whitespace is integral to the appearance of code when displayed on a computer monitor. It's fundamental to readability to have adequate white space around your code. I think it's a great thing to be able to isolate individual elements and examine them!

We need familiar whitespace because we're all pattern matching machines.

But the thing is, code doesn't live on computer monitors.

As I write this in 2015, the vast bulk of code lives in source code repositories. It's a collection of disembodied diffs and their associated merge histories.

We take a snapshot of that repository at an arbitrary moment in time and use that to construct a bytestream on our hard disk.

No one is asking you to interpret that raw bytestream directly. All of us use integrated programming environments, built on top of a graphical windowing system, which reformats that bytestream into an aesthetically pleasing onscreen presentation.

Take this bytestream : “””for(int i=0;i<10;i++){\n\tprintf("%d",i);\n}\n”””

Here's just a sampling of 3 different ways you might convert that byte stream into pixels:







Look, I don't care if your IDE uses proportional fonts, or monospace fonts, or wingdings or braille.  I don't care about your color scheme, your choice of syntax highlighting, your latest emacs macro, oh-my-zsh github, which scrolling plugins you've got installed, or which keys on your DVORAK keyboard binds to autocomplete. #YOLO.

But here's the rub. All of those choices are local to your machine. None of your local choices can leak on to our team's shared perforce depot. None of your local configurations choices alter the bytestream that exists on my computer in any way.

Of course aesthetics matter to programmers.  But don't let that blind you to the reality that any aesthetic quality of your whitespace doesn't matter in the slightest to your compiler, continuous integration server, merge utility, source code repository, build server...

...and especially not to the users of your software.

My problem is that your subjective notions about aesthetics open the possibility for software defects to creep in.

To use a bad analogy:

You're trying to make the blueprints more pretty by drilling holes in the life raft.
(The users won't notice the water coming in until it's too late.)


Appendix B: Some common objections (FAQ style)


Q) Why don't we get a bunch of programmers, randomize formatting styles, and measure how long it takes them to identify software defects?
A) Great idea! That's an excellent procedure for choosing presentation defaults for an IDE! On the other hand, when choosing a formatting style for transporting code between programmers, it makes more sense to use a style which is friendlier to automated tools and reduces the incidence of hidden software defects.

Q) Our ten million line code base already uses ABC style. Should we convert everything to QRS style?
A) Are you crazy? Every change has a cost, consistency has value! Stick with ABC style!!!
However, for any new code, and for any incremental changes you make to existing code, I strongly advise you to adopt the following two guidelines:

  • Where braces are optional in your language ( if, while, do, ), make the usage of braces be mandatory.
  • int raii = 7; // Declare a single variable, initialize it once, (add an optional comment), all self-contained on one line.


3 comments:

  1. > If there exists even one objective scientific test against (i.e. 100% of competent C++ programmers would complete the test and choose 4xSPACE over TAB) then please post such a test in the comments below.

    How about:

    'Which formatting choice offers the higher likelihood of a document being readable when opened in a random text editor'?

    I'm a tabs man myself, but I thought that was the point of spaces - and it's quite a good one. In some contexts, I can see it being very important, perhaps more important than storage space.

    > “There exists at least one objective test which can be used to make a decision between alternative formatting options.”

    This IMO is problematic - it'd be OK if there existed *exactly* one objective test, but there is usually (almost always?) more than one such test - and they can give conflicting results, ala tabs vs spaces.


    ReplyDelete
  2. Hey Mark!

    'Which formatting choice offers the higher likelihood of a document being readable when opened in a random text editor'?

    Setting aside for a minute that 'being readable' is still a subjective test (Is Russian more readable than English? Yes if you're Russian, No if you're English...), there's still the problem that the authors of those random text editors presumably made choices to make them more readable. The ones who chose to represent tabs as 8 spaces (e.g. vim) would presumably prefer the tabbed version of your document. That means there's a non-zero number of programmers (including all the ones who are the authors of the 8xSPACE editors) who would come up with a 'TABS' as an answer on your test.

    Compare that with 100% of programmers who would agree that the size on disk (on average) is never larger for TABS compared with 4xSPACE. i.e. not just most of them, or 99.999% of them, but literally all of them, every single one, without exception. Even the programmers who prefer 4xSPACE.

    Just one other point, we must be careful about polling programmers about what other programmers think, it's the equivalent "Do you think most astronomers would agree Pluto is a planet?" - great for building consensus, terrible for being objective. We actually have a definition for what constitutes a planet. To be objective means to apply that definition. To be subjective is to ask opinions.

    >This IMO is problematic - it'd be OK if there existed *exactly* one objective test...
    Yeah, you raise a good point. I've tried to address it a bit later on, but keep in mind this is only used in the 'existence proof' section, as in the 'base case' for a http://en.wikipedia.org/wiki/Mathematical_induction . Actually, I think it unlikely the 'disk space' metric would be used in a formal definition of the Kolmogorov Style, (i.e. a formatting style based on minimizing Kolmogorov complexity) I just use it here because it is convenient and mush more familiar to a casual audience.

    ReplyDelete
  3. Your "byte stream to pixels" example is a bit broken - the middle rendering isn't actually from the bytestream presented, unless the editor is seriously screwing with you. It both has additional spaces, and worse pre-increments i instead of post-incrementing it.

    ReplyDelete