If statements considered harmful or Goto’s evil twin or How to achieve coding happiness using null objects.

When Edsger Dijkstra wrote his famous letter to the ACM in 1968[1], he created two things which we now take for granted: he condemned, in entirely theoretical argument and for all time, the goto operation; and he introduced the “considered harmful” tag.  Two out of three ain’t bad.

Dijkstra’s argument relied on two key observations. To paraphrase it very simply: it is possible to describe every instant in a dynamic process in terms of a kind of temporal co-ordinate, a formal, mathematical sense of “how did I get here?”  In structured code (i.e. code made of procedures, loops, and branches) it is possible (at least to a first approximation) to associate each line in the code with a specific temporal coordinate.  That is, if you point to any particular statement, you’ll know an awful lot about what must have happened in order to get there.  In contrast, code which is controlled by gotos does not have that association: each line of code may be associated with more than one – perhaps very many different temporal coordinates.  When you ask of spaghetti code “How did I get here”, probably you’ll have no idea.

Why does that matter?  That’s Dijkstra’s other observation: that people are not very good at understanding dynamical processes, and are actually much better at understanding static maps.  If the map and the temporal co-ordinates marry-up, if each physical co-ordinate has one temporal co-ordinate, then we can understand something complicated in terms of something easy.  And that’s a good thing.  In fact, with the benefit of nearly fifty years of hindsight, it’s so self-evidently a good thing that now there’s no argument.  Not only is goto a four-letter word, but anything that smacks of a goto is heresy.  Purists sniff superciliously at structured gotos and even early returns, things which only ever exit from blocks, and so are perfectly compatible with Dijkstra’s spatial and temporal correspondence.

But, goto has a mirror-image twin brother.  It’s just as dangerous, and for exactly complementary reasons.  Where goto puts multiple temporal co-ordinates into one spatial location, this structure places a single temporal coordinate into multiple spatial locations, a sort of comeFrom.  I’m talking about if.

Let me explain what the problem is.  Take a trivial example:

if (!isnull (logFile)) logFile.write (message);

This seems pretty straightforward: if the file is open, send a message to it.  If not, not.  But this simple line hides something known by every programmer with more than one week of experience: that this same construct is going to appear time after time, all over the program, like the refrain of a bad pop song.  That is, the temporal co-ordinate of the write call is tied to the temporal co-ordinate of every other corresponding write call, and they all comeFrom the state of logFile.

You don’t need me to tell you the drawbacks of this repeating refrain: the sins of copy/paste, the repeated tests, the resistance to any change.

But there’s another problem here: the ifs are not merely repeating a temporal co-ordinate in several different places, they’re actually generating a co-ordinate that isn’t even part of the problem at all.  What the programmer actually meant (and I know this, because I am he!) is:

logFile.write (message); // if we have enabled logging, otherwise forget it.

Now, let’s make the problem a little harder.  Now, we write:

// .. do something
if (!isnull (logfile) message = “Log message”;
// .. do something more
if (!isnull (message)) logFile.write (message)  // is this a good test?
if (!isnull (logFile)) logFile.write (message)  // or is this one better??

Here, the problem is slightly more subtle.  What we have is a single temporal co-ordinate, but we’re not actually explicating what it is.  We have a state that we can’t even see.  Probably, what we really meant was:

#ifdef loggingEnabled
   logfile = new stream();
#endif
// do something
#ifdef loggingEnabled
   Message = “Log message”;
#endif
// do something more
#ifdef loggingEnabled
   logFile.write (message);
#endif

But of course, that’s hopelessly old-fashioned, and pretty ugly to boot.  We will do better in just a moment.

Now, let me be perfectly clear about this:  there is nothing wrong with if, per se.  Just as one goto will not unleash the dogs of war inside your program, neither will one if.  Neither am I talking about iterators (nearly all of which involve a decision at some point) or special-purpose code paths like exceptions (though I have a different beef with exceptions[2]).  The problem arises with ifs, just as with goto, when you have lots of them, and they all have to interoperate.  You have a problem when you have a single variable (or worse, some single implicit state) which is tested at multiple different points inside the code.  That shared state is a wormhole running through your program, covertly joining two or more separate and otherwise unrelated bits of code.  If something changes on one end of the wormhole, you may (or may not) need a corresponding change on all the others, otherwise the program will malfunction in all sorts of subtle and interesting ways.  The problem here is not one of taking a decision: the problem is of taking multiple decisions at multiple unrelated locations in a synchronised manner.

When Dijkstra wrote his letter, structured programming was in its infancy, and even if he’d identified the danger of ifs, there was little that he could have done about it – if was pretty much the only decision-making mechanism there was.  In the 1980s, for a brief time, there was an attempt to produce “skeletonising” editors to address inter alia the if problem – you’d point to a variable and only the code that could impact that variable was shown.  (They turned out not to be very helpful because, in practice, real-life code was so intertwined that not much code was actually elided.)

But today, fifty years later, we’ve moved on, and we do have a choice.  Aspect-oriented programming[3] addresses precisely this issue head on (and is worth investigating just for its own merits).  But we already have an almost complete solution, mature and fully-implemented in nearly every modern language: classes.

Within a class, every method, every field, comes with an implied condition:

if (typeof (this) == class) {…}

It’s precisely the synchronised condition that we’ve been explicating until now, but it’s not a corrosive wormhole, because the limit of the condition is precisely the boundary of the class – it joins everything inside the class, and excludes everything outside it.  Instead of spreading repetitions of the condition all over a program, classes can concentrate the condition into a single location – the constructor of the implementing object.

Let’s rewrite the example above in this style.

interface Logger {
   logger () -> NullLogger();
   logger (string filename) -> FileLogger (FileHandle (filename));
   logger (FileHandle handle) ->FileLogger (handle);
   // This is a shorthand notation – when constructing, instantiate subclasses or implementers of this class.
   // Probably would do it with a factory in reality.

   write (string msg);
}

class NullLogger implements logger {
   nulllogger () {};
   override write (string msg) {}
}

class FileLogger implements logger {
   private readonly FileHandle _fileHandle;
   fileLogger (FileHandle handle) {_fileHandle = handle;) //ctor
   override write (string msg) { _fileHandle.write (msg);}
   ~fileLogger () {_fileHandle.flush(); fileHandle.close()} //dtor
}

And then, in the real code…

{  Logger logger (“logfile.log”);
   // do stuff
   logger.write (message);
   // do more stuff
} // logger flushes automatically.

Now, you can see, the main part of the code expresses exactly what is wanted (and no more), there’s no sense of a conditional temporal co-ordinate at all, and the entire test whether to log or not is given by the Logger logger () initialisation – either the constructor instantiates a FileLogger, or else it instantiates a NullLogger (which just silently eats the messages).  And not an if in sight!

Just using interchangeable objects isn’t very innovative (and, at one level, that’s the point: we do it all the time for other reasons).  But there’s something else, less obvious, that I want to point out.  In this pattern, no matter how it’s constructed, a logger is a Logger.  It can be a FileLogger or a NullLogger or a DebugOutLogger or a  PigeonPostLogger, but what it can’t be is a null.  That is, logger will always implement the interface.

There are, hidden in this very small extract, other examples of null objects.  Look at the logger (string) constructor: it turns a string into a FileHandle.  What if that fails (as well it might)?  Yes, it might throw an exception, but then again (and possibly, as part of recovering from the exception) it might return a null FileHandle.  What would that do?  It would silently consume everything written to it, it would assert EOF all the time, and it would return a null string in response to every read.  And what’s a null string?  Probably just ‘’.

A word of warning: some languages express this pattern naturally, but some, like C#, which express pointer semantics[4] for pretty much all variables, require some discipline – you have to make sure that every variable is initialised not to null but to class(null).

The core idea of this pattern is that instead of using ifs, you can use object instantiation both to select between different actions (say, between a FileLogger and a DebugOutLogger), and also to select between go/nogo options (say, between a FileLogger and a NullLogger).  But where, in conventional programming, choices like these would require if statements spread all over the code, using this pattern, they require just one – in a constructor.  (And frequently, as in the example above, the condition can be implied using overloading).  What you end up with is smaller, simpler, more reliable, and hugely more maintainable code.

Dijkstra, in his opening paragraph, opined that “… the goto statement should be abolished from all ‘higher-level’ programming languages…”.  Though complete abolition seems harsh (every now and then a restricted form of goto – or one of its synonyms, like return – comes in very handy) history has shown that consistently treating it as toxic and dangerous has led to significant improvements in the programming arts.  I believe that a similar fate should befall if: we can’t eliminate it completely, and we probably shouldn’t try, but if we regard it as a last resort, rather than a central pillar of programming, nothing but good can come of that.


[1] Go-to statement considered harmful in Commun. ACM 11 (1968), 3: 147–148, reproduced at http://www.cs.utexas.edu/~EWD/ewd02xx/EWD215.PDF

[4] That is, variables start out null, assignments and parameters are passed by reference, and copies are shallow.

Advertisements

One response to “If statements considered harmful or Goto’s evil twin or How to achieve coding happiness using null objects.

  1. Absolutely. In the olden days when aging C programmers used to complain about virtuals being slower than static functions, I’d tell them that was the wrong comparison because a virtual gives you all the functionality of a switch but so much faster. I started using null objects because it seemed like suicide ever to set a pointer to 0, and of course the passive cases came up.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s