Can Data Flow Help Us Escape the von Neumann Machine?

Untangling code with flow-based programming

About a year ago I was struck by George Dyson‘s plea in his Strata London keynote

That’s why we live in this world where we follow this one particular [von Neumann] architecture and all the alternatives were squashed… Turing gave us this very powerful one-dimensional model, von Neumann made it into this two-dimensional address matrix, and why are we still stuck in that world? We’re fully capable of moving on to the next generation… that becomes fully three-dimensional. Why stay in this von Neumann matrix?

Dyson suggested a more biologically based template-based approach, but I wasn’t sure at the time that we were as far from three dimensions as Dyson thought. Distributed computing with separate memory spaces already can offer an additional dimension, though most of us are not normally great at using it. (I suspect Dyson would disagree with my interpretation.)

Companies that specialize in scaling horizontally—Google, Facebook, and many others—already seem to have multiple dimensions running more or less smoothly. While we tend to think of that work as only applying to specialized cases involving many thousands of simultaneous users, that extra dimension can help make computing more efficient at practically any scale above a single processor core.

Unfortunately, we’ve trained ourselves very well to the von Neumann model—a flow of instructions through a processor working on a shared address space. There are many variations in pipelines, protections for memory, and so on, but we’ve centered our programming models on creating processes that communicate with each other. The program is the center of our computing universe because it must handle all of these manipulations directly.

Last night, however, as I was exploring J. Paul Morrison’s Flow-Based Programming, I ran into this:

Today we see that the problem is actually built right into the fundamental design principles of our basic computing engine….

The von Neumann machine is perfectly adapted to the kind of mathematical or algorithmic needs for which it was developed: tide tables, ballistics calculations, etc., but business applications are rather different in nature….

Business programming works with data and concentrates on how this data is transformed, combined, and separated…. Broadly speaking, whereas the conventional approaches to programming (referred to as “control flow”) start with process and view data as secondary, business applications are usually designed starting with data and viewing processes as secondary—processes are just the way data is created, manipulated, and destroyed. We often call this approach “data flow.” (21)

Of course, Morrison is cheering on the data flow approach, so he talks about the tangles flow-based programming can avoid:

In any factory, many processes are going on at the same time, and synchronization is only necessary at the level of an individual work item. In conventional [control flow] programming, we have to know exactly when events take place, otherwise things are not going to work right. This is largely because of the way the storage of today’s computers works—if data is not processed in exactly the right sequence, we will get wrong results, and may not even be aware that it has happened! There is no flexibility or adaptability.

In our [data flow] factory image, on the other hand, we don’t really care if one machine runs before or after another, as long as processes are applied to a given work item in the right order. For instance, a bottle must be filled before it is capped, but this does not mean that all the bottles must be filled before any of them can be capped….

In programming, it means that code steps have to be forced into a single sequence which is extremely difficult for humans to visualize correctly, because of a mistaken belief that the machine requires it. It doesn’t! (24)

To step back a bit, the von Neumann machine works very very well for some kinds of tasks, but doing a series of data processing steps in a single memory space piles on risk as the complexity of the task increases. Steps to defend against that risk, notably regimentation of tasks and encapsulation of data, can ease some of that—but change the process in ways that may not be efficient. Morrison goes on to discuss the challenges of memory garbage collection in a world of long-lived processes that create and discard handles to data. He contrasts them with processes that operate on a particular chunk of information and then explicitly close up shop by disposing of their “information packets” at the end.

I may be too optimistic. Between my time in Erlang, where processes that handle messages and then start over with a clean slate are normal, and my time in markup, which offers tools to define message formats independently of control flows, I see Morrison as presenting a much-needed untangling.

Whether that untangling is a complete answer to Dyson’s question, I’m not certain. I suspect, however, that that the flow-based programming approach solves a lot of questions about distributed and parallel processing, even before we get to its potential for designing code through graphical interfaces.


Sign up for the O'Reilly Programming Newsletter to get weekly insight from industry insiders.
topic: Programming
  • Joshua R. Simmons

    Compelling ideas. As much as I’ve cultivated a healthy skepticism of the shiny syndrome that’s so common among us technologists, I can’t help but think that flow based programming is different.

    It seems fundamentally different. Aside from neatly compartmentalizing complex application logic, I think it’ll lend itself to the sheer volume of data and array of data sources that we’re beginning to see.

    Reading about Jekyll’s wildly successful port to NoFlo set off alarms for me, and after more thought, I’m starting to see why.

    • Paul Morrison

      Yes, Mike Beckerle (currently at Tresys technology) has been using FBP techniques for quite a few years on very large amounts of data, and has kindly contributed several sections to the 2nd ed. of my book, “Flow-Based Programming”.

      Just curious, what were the alarms set off by the Jekyll port? Just wondering – I haven’t really looked into it myself.

      • Joshua R. Simmons

        Glad to discover FBP is more mature than I had thought! As for the alarms, they were good ones — mostly, I was blown away by how much code they managed to eliminate.

        I’m kind of a cowboy developer, usually focused on GTD within cushy PHP frameworks, so I’ve never been architecturally savvy, but one correlation I’ve definitely noticed: the less code, the better. And so far, FBP delivers in spades.

        • Paul Morrison

          Yes, I think it was Dijkstra who said code should be a cost item – and yet there are still companies that measure programmer productivity in kLOC (1000s of lines of code)!

          • Simon St.Laurent

            I’m really happy to see you commenting here! I’ve been enjoying the book, and hope we can have a lot more public conversation on these techniques.

          • Paul Morrison

            Thanks for the kind words! I’m pretty excited by the momentum building for Flow-Based Programming and its relatives! The NoFlo folks deserve enormous credit for kickstarting it! :-)

          • Paul Morrison

            PS Just got reminded about Justin Bozonier’s knol on FBP – – he has some interesting insights there too!

  • saim

    Nice. Thanks for sharing this.

  • John Cowan

    FBP, far from being shiny new, has deep connections to the entire history of computing. It is what International Business Machines customers were doing before computers were invented: passing streams of punch cards through unit record equipment like keypunches, sorters, adders, multipliers, summarizers, collators, and printers. Another important application of FBP is Unix-style pipelines. Coroutines and threads are a suitable substrate for FBP, and indeed the JavaFBP runtime (written by yours truly) makes use of Java threads in a safe and composable way. Lastly, FBP networks (at least the ones without loops) are representable as compositions of pure lazy functions, because each process runs sealed-off from all others, so even if it has state internally (as is often the case), it is pure from the viewpoint of the FBP-level program.

    • Simon St.Laurent

      I agree completely that it’s not new – I probably should have made that clear, though this is the third post I’ve written on it, so I guess I’m forgetting. (First was at )

      I also hadn’t realized that you wrote the JavaFBP runtime – can we talk more about that sometime? The rest all sounds great.

      • John Cowan

        Sure, whenever you like. There have been quite a few changes since then that I haven’t followed up on. The C#FBP is as far as I know basically a translation of the Java one, though I haven’t touched the code for it.

  • Alfredo Sistema

    In FBP the visual aspect falls from the “machine” abstraction that components provide, code reuse is a priority and granularity (think about LabView as an extreme in granularity) is avoided because components can be done with a general programming language.
    Sadly the tools and frameworks available still need some work on them to compete against the current fashion, but we are getting there.
    An interesting side effect of trying to solve problems with FBP is that instead of trying to shoehorn design patterns into the program, you naturally find extension points and bottlenecks. Concurrency, no global shared state and asynchronism are also implied.
    Here’s an example of an IRC room bot:

    You can clearly see in the diagram that adding plug-ins is trivial. Even if they are concurrent or one shot functions.

  • Forrest O.

    Mentioning Google in this context made me think about the advent of timesharing and early PCs. Ted Nelson’s Computer Lib / Dream Machines and others alluded to the analogy of freeing computation from a high-priesthood of mainframe operators. Google and other big players can operate in multiple dimensions now; maybe that’s the next frontier to liberate.

    GPU programming could bring this to the PC level. Has anybody else thought about GPGPU + FBP?

    • Alfredo Sistema

      I talked about it with some GPU programming friends, and the conclusions are that GPU programs are quite hard to wrangle because the tech is quite immature (debugging and error handling are non existant). Things are getting better and FBP is a perfect fit for it.

    • Jon Nordby

      Yes, I’ve given GPGPU FBP a bit of thought. Not on my radar short-term though, MicroFlo has me preoccupied.
      I did my bachelor thesis on GPGPU (use in particle physics), and my experience was that optimal choice of datastructures was critical for performance. I think finding and represent appropriate general-purpose data-structures or problem-specific datastructures, for communicating between compontens will be one of the main challenges.

      • Jon Nordby

        Of course many problems can be well-modeled with uniformly typed dense and sparse matrice, but probably far from all.

  • geezenslaw

    Great article but…

    Most of the software dev contracts I have
    been on in the last 20 years or so: many if not most of the tenured
    staff of programmer coworkers have never heard of Von Neumann much less
    the architecture that bears his name. Not to mention Turing and his
    undeniable contributions.

    Case-in-point: Mr. Cowan is correct in
    his assessment and example of the old Unix pipelining. For about 7 years
    I worked in a research environment that was a Sun Sparc shop that at
    the time was able to handle data a rate of an order of magnitude over
    the PCs at that time and still would today.

    Maybe there is hope for

    Unfortunately, engineering never trumps business.

    In Tech: training and experience never trumps youth and speed.