Multi-threading made ridiculously simple

Modern computer systems couldn’t function without it, and it’s at the heart of nearly every modern software development. And yet, the very mention of its name is enough to strike terror into the hearts of even hardened developers. Why is multi-threading so difficult?

Actually, multi-threading is hard for many of the same reasons that ordinary programming is hard. In ordinary programming, we are constantly trying to understand a dynamic, evolving process by looking at snapshots and static maps. “How did this variable get this value?” was ask, or “How come this block isn’t executing?” By the time we’re asking these questions, the causes of the behaviour are long-gone, and we’re trying to reconstruct what has been happening using hints and half-truths.

At least, though, in conventional programming, whatever happens keeps happening. If it happened in a certain way one time, we can run it again, give it the same input, and it should happen just the same way again. We can stop the process, peer into every nook and cranny to understand the state, and then advance a bit further and look again. We might not be able to reverse time, but we can at least replay it (and where things happen differently, then we’ve got clear evidence of something pathological like dangling pointers).

But in a real-time program, or worse, in a multi-threaded program, we don’t have this luxury. The whole point of such programs is that they behave differently each time they run. Its not clear even what it means to suspend time, because different parts of the overall system are marching to the beats of different drummers. So, multi-threaded programs are harder partly because they’re inherently more confusing, partly because our normal debugging tools don’t help much, but mostly because the questions we’re used to asking are no longer sensible questions. “How did this variable get this value” now no longer means “What happened previously in the chain of execution to place this value into this variable”, it is now just as likely to mean “Who has reached into my memory and deposited this value into this variable while my back was turned, and when did they do it?”

That reframing of the question is actually quite useful, because it points towards the solution. In the bad old days, before structured programming, we would wonder “How did the execution reach this bit of code”, and the chains of transfer-of-control criss-crossing the program became known as “spaghetti code”. In (structured) multi-threading terms, it’s not transfers of execution, but transfers of state – now we’re faced with spaghetti variables. Unfortunately, while we could place the entire blame for spaghetti code on the goto statement, there’s no one construct on which we can blame spaghetti variables[1], so as a result nearly every writer of multi-threading code constructs complex patterns of semaphores and mutexes to try to limit the damage. How’s that working out? No, I don’t think so either.

Spaghetti code was defeated not by the introduction of lots of different spaghetti patterns, but by the introduction of a single anti-spaghetti concept: structured code. What we need is an equivalent of structure for multi-threading. The good news is: there is such a thing, and the better news is: it’s not complicated.

Back in 1978, Tony Hoare came up with a language called CSP – Communicating Sequential Processes[2]. It was a language specifically designed to bring order to highly-multi-threaded systems – he saw it being used in distributed multiprocessing. After that (and this is, I think unique in language design) CSP split into two. One the one hand, Hoare and others developed it into a purely-theoretical abstract algebra, which could be used to understand the workings of existing systems, and flush out their bugs and deadlocks (and, in that capacity, it has solved many many problems in real-world systems). On the other hand, the ideas were adopted into other, more-or-less mainstream languages which were intended for real-life programming. Languages like Squeak (a language for communicating with mice), Occam (David May’s official Transputer language), and more recently Erlang (which was developed specifically to manage highly-distributed telecoms systems). With all this pedigree, CSP clearly is not some mere theoretical ideal: it represents a significant contribution to systems engineering which has proved itself in multiple application domains over a period of forty years. If its ideas aren’t yet mainstream, it’s because the domains it has been applied to thus far have been somewhat specialist.

So, how does it it work? You may be familiar with the idea of the unix pipeline: you set up a series of independent programs, each of which takes one input stream and one output stream, and you connect them together into a pipleline where the output of each one feeds the input of the next. Like this[3]:

linecount .c* | sort field=3 | drawpie

These properties are almost too obvious to be worth pointing out, but, you’ll notice:

  • The individual programs know nothing about each others’ existence. They live in separate address spaces, potentially in separate machines. Each is conventionally deterministic, and can therefore be tested (and debugged) conventionally in isolation of all the others.
  • The only connection between these process are the queues of data joining one to another. Only the data is shared. But, data having been sent by one process, it can never be recalled. The communication is strictly one-way.
  • Even using the queues, the synchronisation between the processes is all but non-existent. If there’s no data on an input queue, the process cannot consume it. Otherwise, the processes run independently – it’s up to the task switcher (which, if it exists at all, is part of the operating system, outside any of the processes or the queues) to manage the execution to keep the whole system running efficiently (which, generally, means keeping the queues short).

CSP takes this unix model and generalises it. It still assumes a collection of completely independent sequential processes connected by queues of data, but adds:

  • Each process can take any number of inputs, and can generate any number of outputs
  • Queues can contain data of any type – it doesn’t need to be unix-style serialised data. More specifically, they can be queues of objects.
  • In general, a process receiving data from multiple queues can run if any queue contains data, or only if all queues contain data.

A CSP-based program, then, describes the collection of independent processes, and how those processes are wired together. Depending on the specific language, dealocks and livelocks are either obvious, or can be eliminated entirely. Here’s an example in an idealised CSP-based language:

Program demo (in stdin, out stdout) {
    channel a, b, c;

    process split (in stdin, out a, out b) {
        c ← stdin;
        if (some property of c) c → a;
        else c → b;
    process filter (in s, out t) {
        q ← s;
        if (some property of q) q → t;
    process zip (in a, in b, out stdout) {
        either {// parallel decider – whichever unbaulks first gets to run.
            on r ← a; r → stdout;
            on r ← b; r->stdout;

    first (stdin, a, b);
    filter (b, c);
    zip (a, c, stdout);

The execution lifecycle depends on the precise language and domain, but would typically proceed in three phases:

  • Setup: Create a message queue for each channel. Create a worker thread for each process, initialise the task in each one, and start them.
  • Runtime: Feed data from the outside world into the input channels, and then allow the data to propagate through the system until it emerges at the output consumers.
  • Teardown: Shut down the worker threads, and reclaim the message queues[4]. Generally there are two ways to do this:
    • either shut down the tasks using some supervisory task manager as a result of a signal inside the system, or
    • add a special “end of data” object to the end of a queue, which instructs the process which receives it to send the same signal to all its output queues and then destroy itself.

By constructing multi-threaded systems in this way, we have trivialised almost out of existence the difficulty of multi-threading.

  • The separate processes are entirely conventional sequential deterministic programs. We can leverage everything we already know about programming to build and debug them.
  • We can interrupt one process to scrutinise it, and it will have no effect on the wider system (except to cause its input queues to grow). We can also suspend the queues, allowing us to watch how a single process behaves dynamically.

Multi-threading is so much a part of modern software that I believe that it’s only matter of time before the kinds of languages we use for day-to-day programming will incorporate ideas like CSP routinely, and queues will be as pervasive a concept as function parameters. Until then, though, what’s a jobbing programmer to do?

Actually, you can get started right away. All you need is a generic queue class which exports:

  • (on the input side) an enqueue function and a shutdown function (which enqueues an end-of-stream marker)
  • (on the output side) a dequeue function (which baulks on empty) and (optionally) a readahead function

Worker objects are threads, constructed such that the thread function takes (as reference parameters) only queues[5].

Just to show how easy it is, here  is a C++ library which implements just such a queue class, designed for pointers. Things to note are:

  • End of stream marker is to enqueue a null. A null reaching the output will not baulk, but will generate an infinite stream of nulls.
  • Memory disposal is easy: if you alloc a pointer, either you dispose it or you enqueue it. If you dequeue a pointer, either you dispose it or you enqueue it somewhere else. Eventually it will come to rest.
  • Deadlocks are eliminated thus: running worker threads can have queues attached to them, but only of one gender. That is, if output queues are attachable, then all the input queues must be created by the object at construction time. Alternatively, if input queues are attachable, then the output queues must be created by the constructor. This doesn’t eliminate cycles, but it does eliminate cycles with any data in them.

Multi-threading isn’t a new paradigm, but it is unusual and unfamiliar to most programmers. That’s a shame, because it has within it the possibility to write not just more complex and performant systems than can be made using conventional sequential-only programming, but also to dramatically simplify many of those systems we can write conventionally. With careful design, multi-threading allows functionality available no other way (including, for example, runtime-reconfiguration of systems – like attaching a debugger or a logger, or live-system recovery and self-healing). CSP delivers all that benefit in a clean, simple, and reliable wrapper.

1 Unless, of course, we’re going to throw away the whole concept of a shared variable. That’s not reasonable. Is it?

3 These are not real unix commands. If you don’t already know about pipes, then unix’s cryptic and elliptical command style would probably be terrifying!

4 Of course, the teardown may be omitted entirely. The beauty of this approach is that an external monitor can be used to stop and start tasks independent of the rest of the system – this permits patching and restarting without rebooting an entire installation. It may be that the only comprehensive teardown is a power cycle!

5 And if you insist on using global variables or singletons, you’ve only got yourself to blame!

One response to “Multi-threading made ridiculously simple

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s