the zero bit stream

programmer things

The Flow of Code

Here is a very simple piece of code that could be from any number of programming languages:

X = somefunction(Expression);

An explanation of how this code is evaluated (assuming strict evaluation) might be something like this, “Expression is evaluated and then passed to somefunction. The value returned from the function is bound to X.” Right to left. This is unfortunate because we read from left to right. A few programming languages and environments treat left-to-right reading of code as an opportunity to structure the flow of data in a way that aids in comprehension and almost all have some tricks to simulate this in various ways, but none seem to fully embrace it in a practical way. I think we should.

Shell programming with pipes is the most immediately recognizable environment for programming with data flowing left to right. Here is a silly example which abuses the cat command to achieve this flow:

cat robots.txt | sort | grep "login"

The “correct” way to write this according to pedants is this:

sort < robots.txt | grep "login"

I always write my shell scripts the first way because you can follow the flow of data literally from left to right and it makes sense. You can also compose the functions (commands in this case, really, which act like functions on streams of bytes) and move them around more easily.

Forth and related concatenative programming languages actually use this left to right flow, each element being a function which takes a stack and returns a new stack. Here is some code to do some very simple math and print the result:

1 2 + 3 * .

The evaluation of that code is something like “Push 1 on the stack. Push 2 on the stack. Pop the top two values off the stack, add them and push the result. Push 3 on the stack. Pop the top two values off the stack, multiply them and push the result. Print the top value of the stack.” Evaluation is left to right and the code is extremely succinct. The problem with Forth (and similar languages) is that longer and more complicated functions are difficult to read; you have to keep track of the stack in your head and what each value in the stack might represent (a whole bunch of unlabeled intermediate values which are lexically removed from where they are consumed). The value resulting from one operation will be used by another operation later on without any visual way to know this other than carefully considering the effects of the intermediate stack operations. That said, Joy, a concatenative language in the Forth tradition, is on my list of languages to learn.

All C-like languages have the capability of building so-called fluent interfaces, where the object methods return objects such that operations can be chained from left to right using the dot (.) or arrow (->) operator. Here is a made-up example:

myObject.operation1().operation2(param).operation3()

Of course, the word “fluent” shares etymology and meaning with “flow.” When I am working with languages in this family (pretty much 100% of the time when working for money) I tend to gravitate towards libraries with fluent interfaces. Even when I should be writing SQL, I’m generally writing code in a fluent interface that generates actual SQL queries behind the scenes. jQuery is a JavaScript library with a fluent interface that is used on a majority of JavaScript enabled websites. I believe it is, in part, the fluent nature of the interface that made it so popular. It also worked nicely to hide many browser differences and incompatibilities. Remember that when jQuery appeared, plenty of other libraries dealt with browser differences, but none had as nice a programming interface.

Lisps have the power of macros to rearrange the structure of code before evaluation. Clojure creates something of a fluent interface with the threading macros (not related to execution threads). The (->) macro allows this code:

(last (second (first object) param))

to be written like this:

(-> object first (second param) last)

There are limitations with this model as the language was not designed around this concept so while the (->) macro threads the values through the first argument of the functions, the (->>) macro threads the values through the last argument of each function and therefore they are only useful if you are working with an API designed to support one or the other. Clojure also has the doto macro which allows ugly, imperative Java APIs like this:

object.foo();
object.bar(param);
object.baz();

To be consumed in Clojure like this:

(doto object (.foo) (.bar param) (.baz))

The functional language F# that came from Microsoft Research comes with the forward pipe operator (|>) which is defined thus:

let (|>) x f = f x

All it does is take the value on the left and apply it to the function on the right. F# allows partial application and operators are left associative so this code:

third (second param (first value))

Can be rewritten like this, creating a nice flow:

value |> first |> second param |> third

In Haskell the (|>) operator can be defined in exactly the same way as F# and appears to work the same way (with lazy evaluation). However, a discussion of Haskell in incomplete if you aren’t talking about monads. Consider that the second law of monads is this:

return x >>= f = f x

If we combine this with the Identity monad which doesn’t really do anything useful (like ordering a burrito without the tortilla), we should be able to use the bind operator (>>=) to get the effect of switching the order of function application and hence have our values flow through the functions from left to right. To reduce confusion, let’s give “return” a better name:

let burrito x = return x

Here is a function for adding ingredients that can chain this way:

let addIngredient i x = burrito (x ++ " " ++ i)

Now we can do this:

burrito "beans" >>= addIngredient "cheese" >>= addIngredient "rice"

Which produces the value “beans cheese rice”; more evidence that Haskell is the greatest programming language in the world and monads can do anything. That would be great if it were the whole story, but Haskell has as a strange motto, “Avoid Success at All Costs!”, and the preferred way to write monadic code is with the “do” notation which breaks the flow and looks a lot like drab, imperative code.

Prolog is interesting because a clause in Prolog has two interpretations, the logical meaning and the resolution which is left to right. Consider this Prolog predicate (a predicate is similar to a function):

happy(X) :- healthy(X),not(incarcerated(X)),wealthy(X).

The logical meaning of this predicate is something like “X is happy if X is healthy, and X is not incarcerated, and X is wealthy.” The way it is resolved is more like, “check if X is healthy, if not fail, if so check if X is incarcerated, if so fail, if not check if X is wealthy, if so succeed.” The (:-) is pronounced “if” and represents logical implication (it is supposed to resemble an arrow pointing to the left) and the comma (,) is pronounced “and” and represents logical conjunction. Conjunctions are checked left to right and the associated predicates are called much like functions. Some predicates can have side effects. For example the “write” predicate prints the value passed to it, and then succeeds deterministically.

Predicates are nondeterministic which means they may succeed or fail (no further predicates in the conjunction are called on failure) or may succeed multiple times, re-binding variables passed to it in different ways. However, if you stick to deterministic predicates (those that are guaranteed to succeed once and only once) you can write code that reads and executes left to right. Also, starting a line with (:-) is a way to call predicates in a Prolog script. The following Prolog program prints “22”:

double(In, Out) :- Out is In + In.
plus2(In, Out) :- Out is In + 2 .
:- double(10, X),plus2(X, Y),write(Y).

This works because the “is” operator evaluates the structure on the right side according to mathematical rules and unifies it with the left side. If the left side is a variable, this works a bit like binding or assignment. This can get tedious if you are stringing a lot of predicates together, since you have to come up with names for each output/input variable combination:

:- double(10, X),plus2(X, Y),plus2(Y, Z),double(Z, W),write(W).

Fortunately, there is a thing called Definite Clause Grammar (DCG) which is actually designed to help parse lists easily but it assumes that you are going to combine a bunch of predicates that take an input and output variable as the last two parameters, passing the output of one to the input of the next one, and elides all of that. It uses the (–>) operator instead of (:-). This bit of code is roughly equivalent to the last, both print “48”:

double_plus_good --> double,plus2,plus2,double.
:- double_plus_good(10,X),write(X).

I submit that most programmers end up coercing the languages they work in into a left to right flow style because it is natural. An algorithm is a series of steps to produce a calculation, like a recipe. Programming is encoding algorithms, so it is natural to want to view them as steps. While this may sound like imperative style, it is entirely orthogonal. The steps in an algorithm can (and most often should) be pure functions, even if the language does not enforce it.

I propose a programming language that is built around this left to right flow like Forth and friends (minus the stack manipulation). Here is some code in a version of my imaginary language:

"localhost:customerdb" getDatabase (get "people") -> PeopleRecords
"output.log" getFile recordParse -> LogRecords
PeopleRecords ("customer" = 1) (project "cookie") -> CustCookies
LogRecords ("cookie" in CustCookies) count -> CustomerHits
LogRecords count -> AllHits
CustomerHits (divBy AllHits) print

Each new line starts with a value, and then that value gets passed through successive functions which return new values (at least that is the semantics of it, evaluation could occur in many ways and the compiler/runtime could optimize in any number of ways). Functions start with a lower case character, locally bound values start with upper case letter and all names are camel cased, though these conventions are arbitrary. The arrow operator (->) signifies binding the value returned from previous function to a name. Functions which have parameters are enclosed with those parameters in parentheses and may be infix as in the case of (=) and “in”. Nothing is nested more than 1 level. I believe nesting of all kinds to be a source of confusion and complexity that hinders reading code, although I admit there are benefits, too, like terse code and the power of closures. There is no invisible stack to keep track of. The types are mostly obvious and easily inferred. PeopleRecords and LogRecords are of a different internal type (handle to database table with constraints and handle to parsed log file) but have the same interface (conceptual model) which is “bag of records.” The code reads from left to right, top to bottom.

Some other things that I didn’t address are function definition and code branching (conditionals, switch, etc), record types, interfaces. I think pattern matching in the function head would be sufficient for all branching. Disallowing other types of branching within a function could improve code readability dramatically. Record types could have a F#-style type provider mechanism. Go’s statically duck-typed interface system is nice. That combined with multimethod dispatch, pattern matching and generic/abstract types could solve the so-called expression problem and most use cases. Most of these details are unrelated to the main feature, and are matters of taste or practicality.

If you have any thoughts on this, tweet me.