the zero bit stream

programmer things

The Flow of Code

Here is a very simple piece of code that could be from any number of programming languages:

X = somefunction(Expression);

An explanation of how this code is evaluated (assuming strict evaluation) might be something like this, “Expression is evaluated and then passed to somefunction. The value returned from the function is bound to X.” Right to left. This is unfortunate because we read from left to right. A few programming languages and environments treat left-to-right reading of code as an opportunity to structure the flow of data in a way that aids in comprehension and almost all have some tricks to simulate this in various ways, but none seem to fully embrace it in a practical way. I think we should.

Shell programming with pipes is the most immediately recognizable environment for programming with data flowing left to right. Here is a silly example which abuses the cat command to achieve this flow:

cat robots.txt | sort | grep "login"

The “correct” way to write this according to pedants is this:

sort < robots.txt | grep "login"

I always write my shell scripts the first way because you can follow the flow of data literally from left to right and it makes sense. You can also compose the functions (commands in this case, really, which act like functions on streams of bytes) and move them around more easily.

Forth and related concatenative programming languages actually use this left to right flow, each element being a function which takes a stack and returns a new stack. Here is some code to do some very simple math and print the result:

1 2 + 3 * .

The evaluation of that code is something like “Push 1 on the stack. Push 2 on the stack. Pop the top two values off the stack, add them and push the result. Push 3 on the stack. Pop the top two values off the stack, multiply them and push the result. Print the top value of the stack.” Evaluation is left to right and the code is extremely succinct. The problem with Forth (and similar languages) is that longer and more complicated functions are difficult to read; you have to keep track of the stack in your head and what each value in the stack might represent (a whole bunch of unlabeled intermediate values which are lexically removed from where they are consumed). The value resulting from one operation will be used by another operation later on without any visual way to know this other than carefully considering the effects of the intermediate stack operations. That said, Joy, a concatenative language in the Forth tradition, is on my list of languages to learn.

All C-like languages have the capability of building so-called fluent interfaces, where the object methods return objects such that operations can be chained from left to right using the dot (.) or arrow (->) operator. Here is a made-up example:

myObject.operation1().operation2(param).operation3()

Of course, the word “fluent” shares etymology and meaning with “flow.” When I am working with languages in this family (pretty much 100% of the time when working for money) I tend to gravitate towards libraries with fluent interfaces. Even when I should be writing SQL, I’m generally writing code in a fluent interface that generates actual SQL queries behind the scenes. jQuery is a JavaScript library with a fluent interface that is used on a majority of JavaScript enabled websites. I believe it is, in part, the fluent nature of the interface that made it so popular. It also worked nicely to hide many browser differences and incompatibilities. Remember that when jQuery appeared, plenty of other libraries dealt with browser differences, but none had as nice a programming interface.

Lisps have the power of macros to rearrange the structure of code before evaluation. Clojure creates something of a fluent interface with the threading macros (not related to execution threads). The (->) macro allows this code:

(last (second (first object) param))

to be written like this:

(-> object first (second param) last)

There are limitations with this model as the language was not designed around this concept so while the (->) macro threads the values through the first argument of the functions, the (->>) macro threads the values through the last argument of each function and therefore they are only useful if you are working with an API designed to support one or the other. Clojure also has the doto macro which allows ugly, imperative Java APIs like this:

object.foo();
object.bar(param);
object.baz();

To be consumed in Clojure like this:

(doto object (.foo) (.bar param) (.baz))

The functional language F# that came from Microsoft Research comes with the forward pipe operator (|>) which is defined thus:

let (|>) x f = f x

All it does is take the value on the left and apply it to the function on the right. F# allows partial application and operators are left associative so this code:

third (second param (first value))

Can be rewritten like this, creating a nice flow:

value |> first |> second param |> third

In Haskell the (|>) operator can be defined in exactly the same way as F# and appears to work the same way (with lazy evaluation). However, a discussion of Haskell in incomplete if you aren’t talking about monads. Consider that the second law of monads is this:

return x >>= f = f x

If we combine this with the Identity monad which doesn’t really do anything useful (like ordering a burrito without the tortilla), we should be able to use the bind operator (>>=) to get the effect of switching the order of function application and hence have our values flow through the functions from left to right. To reduce confusion, let’s give “return” a better name:

let burrito x = return x

Here is a function for adding ingredients that can chain this way:

let addIngredient i x = burrito (x ++ " " ++ i)

Now we can do this:

burrito "beans" >>= addIngredient "cheese" >>= addIngredient "rice"

Which produces the value “beans cheese rice”; more evidence that Haskell is the greatest programming language in the world and monads can do anything. That would be great if it were the whole story, but Haskell has as a strange motto, “Avoid Success at All Costs!”, and the preferred way to write monadic code is with the “do” notation which breaks the flow and looks a lot like drab, imperative code.

Prolog is interesting because a clause in Prolog has two interpretations, the logical meaning and the resolution which is left to right. Consider this Prolog predicate (a predicate is similar to a function):

happy(X) :- healthy(X),not(incarcerated(X)),wealthy(X).

The logical meaning of this predicate is something like “X is happy if X is healthy, and X is not incarcerated, and X is wealthy.” The way it is resolved is more like, “check if X is healthy, if not fail, if so check if X is incarcerated, if so fail, if not check if X is wealthy, if so succeed.” The (:-) is pronounced “if” and represents logical implication (it is supposed to resemble an arrow pointing to the left) and the comma (,) is pronounced “and” and represents logical conjunction. Conjunctions are checked left to right and the associated predicates are called much like functions. Some predicates can have side effects. For example the “write” predicate prints the value passed to it, and then succeeds deterministically.

Predicates are nondeterministic which means they may succeed or fail (no further predicates in the conjunction are called on failure) or may succeed multiple times, re-binding variables passed to it in different ways. However, if you stick to deterministic predicates (those that are guaranteed to succeed once and only once) you can write code that reads and executes left to right. Also, starting a line with (:-) is a way to call predicates in a Prolog script. The following Prolog program prints “22”:

double(In, Out) :- Out is In + In.
plus2(In, Out) :- Out is In + 2 .
:- double(10, X),plus2(X, Y),write(Y).

This works because the “is” operator evaluates the structure on the right side according to mathematical rules and unifies it with the left side. If the left side is a variable, this works a bit like binding or assignment. This can get tedious if you are stringing a lot of predicates together, since you have to come up with names for each output/input variable combination:

:- double(10, X),plus2(X, Y),plus2(Y, Z),double(Z, W),write(W).

Fortunately, there is a thing called Definite Clause Grammar (DCG) which is actually designed to help parse lists easily but it assumes that you are going to combine a bunch of predicates that take an input and output variable as the last two parameters, passing the output of one to the input of the next one, and elides all of that. It uses the (–>) operator instead of (:-). This bit of code is roughly equivalent to the last, both print “48”:

double_plus_good --> double,plus2,plus2,double.
:- double_plus_good(10,X),write(X).

I submit that most programmers end up coercing the languages they work in into a left to right flow style because it is natural. An algorithm is a series of steps to produce a calculation, like a recipe. Programming is encoding algorithms, so it is natural to want to view them as steps. While this may sound like imperative style, it is entirely orthogonal. The steps in an algorithm can (and most often should) be pure functions, even if the language does not enforce it.

I propose a programming language that is built around this left to right flow like Forth and friends (minus the stack manipulation). Here is some code in a version of my imaginary language:

"localhost:customerdb" getDatabase (get "people") -> PeopleRecords
"output.log" getFile recordParse -> LogRecords
PeopleRecords ("customer" = 1) (project "cookie") -> CustCookies
LogRecords ("cookie" in CustCookies) count -> CustomerHits
LogRecords count -> AllHits
CustomerHits (divBy AllHits) print

Each new line starts with a value, and then that value gets passed through successive functions which return new values (at least that is the semantics of it, evaluation could occur in many ways and the compiler/runtime could optimize in any number of ways). Functions start with a lower case character, locally bound values start with upper case letter and all names are camel cased, though these conventions are arbitrary. The arrow operator (->) signifies binding the value returned from previous function to a name. Functions which have parameters are enclosed with those parameters in parentheses and may be infix as in the case of (=) and “in”. Nothing is nested more than 1 level. I believe nesting of all kinds to be a source of confusion and complexity that hinders reading code, although I admit there are benefits, too, like terse code and the power of closures. There is no invisible stack to keep track of. The types are mostly obvious and easily inferred. PeopleRecords and LogRecords are of a different internal type (handle to database table with constraints and handle to parsed log file) but have the same interface (conceptual model) which is “bag of records.” The code reads from left to right, top to bottom.

Some other things that I didn’t address are function definition and code branching (conditionals, switch, etc), record types, interfaces. I think pattern matching in the function head would be sufficient for all branching. Disallowing other types of branching within a function could improve code readability dramatically. Record types could have a F#-style type provider mechanism. Go’s statically duck-typed interface system is nice. That combined with multimethod dispatch, pattern matching and generic/abstract types could solve the so-called expression problem and most use cases. Most of these details are unrelated to the main feature, and are matters of taste or practicality.

If you have any thoughts on this, tweet me.

Python 3 Is Killing Python

Python 3 is easily the worst thing to happen to the Python community. I remember when I first used Python, I had been spending a lot of time in C++ land, and Python was like a revelation. I could open up a text editor and have a working program in seconds or minutes, not hours or days. I remember when Python 2.5 came out with a lot of neat new language features. I love Python, but I acknowledge that it has weaknesses, but that’s OK, all programming languages do. Its strengths are what make it interesting. While Python 3 has some tiny, incremental improvements over Python 2, it has lost many of Python 2’s strengths.

One of the great strengths of Python 2 is the long tail of 3rd party libraries to do, well, just about anything. Python 3 doesn’t have this. Yes, a lot of libraries have been ported, but ten times as many have not, and are not trivial to port. For example, you have a need to parse X, where X is something nontrivial to parse and not extremely common like YAML or JSON. There is a good chance that a parser has already been written for X in Python 2 and has not been ported to Python 3. Additionally, given the fundamental differences between Python 2’s byte string (str) and Python 3’s byte string (bytes), it will not be easy to port, in fact, it will be difficult to port and will require quite a bit of trickery to port it in such a way as to maintain Python 2 and Python 3 compatibility. So, you have a few choices, write your app in Python 2 (a deprecated language) quickly, port the library (and all its dependencies) which will take ten times as long, or use a different programming language that also has a long tail of libraries, but doesn’t suffer the Python 2 / 3 problem. Choice #2 is obviously not popular, because if it was, we’d have a lot of Python 3 apps in production, and much of the long tail of Python 2 libraries would be ported. Neither of those things is true. People either continue to write software in Python 2 or they pick another language that did not shoot itself in the face.

Another great strength of Python 2 was that programs written in it would almost always run on the next version of Python without much alteration. If your company runs on software written in Python 2 (as many do) it will cost you a great deal of money to port it to Python 3, because your requirements file is probably quite large and stuffed with all manner of libraries that have not been ported. There is no sane business reason to spend 100s of thousands of dollars or millions of dollars worth of engineering time to port to Python 3. You might as well ask someone to port their entire codebase to Ruby. Except, that would be cheaper. Now, if you are going to have to rewrite your software either way, would you choose Python 3? No.

Popular libraries that support Python 2 and 3 are almost all written in a subset of the languages that runs on both platforms. SQLAlchemy, one of my favorite Python libraries, does this well. Django does this, too, but not so well. This subset language, which I will call Python X, is not fun to use, requires weird hacks, and generally is less powerful than either Python 2 or Python 3. How fun is it to port existing Python 2 libraries to Python X? Not fun at all which is sad because fun used to be what made Python so great.

Python 2, sadly, has been deprecated. Python 3 languishes in disuse. The changes in Python 3 are small. Not much was gained, yet much was lost. For the past several months, I have been building applications and services in Python 3. I’m not blown away by it. It is very similar to writing software in Python 2, except that there is a much smaller pool of 3rd party libraries available. There’s really nothing amazing here. The Python community was supposed to be moving to Python 3 over the past few years, but it has become increasingly obvious that people are moving to new languages (or old languages rediscovered). Some of these languages have great features, like powerful type systems, pattern matching, better performance, better threading and concurrency, simpler FFI, nicer lambdas, and so on.

One solution is to fork Python 2.7, and continue developing the language, adding features in a backwards compatible way so large, unportable (due to financial constraints) Python 2 applications can continue to evolve and improve and bring value to the people and companies that invested so much time developing them. This is the correct thing to do (actually, it would be best if Guido and other leaders in the Python community did this officially instead of forcing a fork). Features from Python 3 that can be backported to Python 2 should be, and a Python 2.8 release should be made. Those few who have actually spent time to write new software in pure Python 3 can use a tool like 3to2 to bring Python 2.8 compatibility.

There are perhaps other solutions, but reviving Python 2 seems to me to be the correct thing to do. The Python 2 revival will not happen officially because those in charge of such things have shown disdain for users of Python 2. If the community does not rally and revive Python 2, Python 3 will after many years become the standard Python and many libraries will be ported (though most will certainly never) and many investments will be lost. The community will, by that time, have shrunk dramatically, and lost its former glory. People will move on.

Beautiful Quicksort in Common Lisp

Pre-disclaimer. Angry Haskellers: please go away, I didn’t write this for you. This isn’t a X is better than Y piece, just a friendly comparison and maybe a tool to help people understand some functional programming concepts better.

Second pre-disclaimer. This is Quicksort, just not one you’d use in production. You can see it does twice as many comparisons as necessary, or much worse if it encounters a pre-sorted list. The list manipulation (via ++ or APPEND) is doing more work and using more memory than necessary, even for a purely functional implementation.

Quicksort in Haskell vs. Common Lisp

Haskell apologists often cite how Haskell code is terse and expressive. I’ve used Haskell a bit, and while I do not find its syntax very friendly, it is quite powerful. (As a side note, I really like Standard ML, and I wish I could transplant its eagerness and impurity into Haskell.) Common Lisp is a master of multi-paradigm programming and it wears the functional hat pretty well, too. So, let’s see how it stands up against Haskell.

The canonical “beautiful” Haskell function is the quicksort. Here it is:

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser  = filter (< p) xs
        greater = filter (>= p) xs

I gotta admit, that’s a pretty good one. Let’s see what it looks like in Common Lisp.

(defun quicksort (list)
  (when list
    (destructuring-bind (p . xs) list
      (let ((lesser (remove-if-not (lambda (x) (< x p)) xs))
            (greater (remove-if-not (lambda (x) (>= x p)) xs)))
        (append (quicksort lesser) (list p) (quicksort greater))))))

I probably wouldn’t write it this way, but this is the closest approximation of what the Haskell function is doing. Let’s break it down into pieces and show which parts correspond in the two examples.

The Haskell code is using pattern matching to deal with the empty list case, and destructuring to split the list into head and tail.

quicksort []     = []
quicksort (p:xs) = expr

The Common Lisp code is using WHEN on the input (which returns NIL for an empty list and otherwise evaluates the expression) and is using DESTRUCTURING-BIND to split the list into head and tail.

(when list
  (destructuring-bind (p . xs) list
    expr))

Haskell uses “where” to bind local variable in an expression:

... = expr
    where
        binding_a = expr_a
        binding_b = expr_b

Common Lisp uses LET to bind local variables in an expression:

(let ((binding-a expr-a)
      (binding-b expr-b)
  expr)

Haskell uses partial application of the comparison functions with “filter” to filter the lists.

lesser  = filter (< p) xs
greater = filter (>= p) xs

Common Lisp uses lambdas to construct the filtering closures and REMOVE-IF-NOT which is essentially equivalent to Haskell’s “filter”.

(lesser (remove-if-not (lambda (x) (< x p)) xs))
(greater (remove-if-not (lambda (x) (>= x p)) xs))

Even Terser

OK, but there’s an even shorter, one-line Haskell quicksort.

qsort (p:xs) = qsort [x | x<-xs, x<p] ++ [p] ++ qsort [x | x<-xs, x>=p]

Let’s ignore for now the fact that using 1 and 2 letter variables is a very silly way to write “shorter” code. What’s great about this solution, though, is the use of list comprehensions. Common Lisp doesn’t have list comprehensions, but it doesn’t matter because it has mother-flipping macros, and we can add list comprehensions.

Now, I could pull in a library that gives list comprehensions, but wouldn’t it be funner to write the macro right here?

I’m not going to build a general replacement for all Haskell list comprehensions, just the ones here. I see 4 parts, the value, a binding, a list, and a test. Here’s my first shot at a list comprehension macro:

(defmacro @ (value bind list test)
  (let ((newlist (gensym)))
    `(let ((,newlist nil))
       (dolist (,bind ,list)
         (when ,test
           (push ,value ,newlist)))
       (nreverse ,newlist))))

I selected the @ symbol because it’s short and sweet, and mainly just short. The Haskell folks seem to think 1-letter names are the bees knees, so there you go. A quick test:

;; Something like this in Haskell: [x*2 | x<-[1, 2, 3, 4, 5, 6, 7], x>3]
(@ (* 2 x) x (list 1 2 3 4 5 6 7) (> x 3))
;;-> (8 10 12 14)

In case you don’t know how DEFMACRO works, it takes the input forms, unevaluated and returns a list which is evaluated it its place. You can debug macros to see the intermediate form by just doing the transformation with MACROEXPAND:

(macroexpand '(@ (* 2 x) x (list 1 2 3 4 5 6 7) (> x 3)))
(LET ((#:G791 NIL))
  (DOLIST (X (LIST 1 2 3 4 5 6 7)) (WHEN (> X 3) (PUSH (* 2 X) #:G791)))
  (NREVERSE #:G791))

Alright, armed with our new list comprehension macro, let’s write our super-short quicksort:

(defun qsort (l)
  (when l (destructuring-bind (p . xs) l
            (append (qsort (@ x x xs (< x p))) (list p)
                    (qsort (@ x x xs (>= x p)))))))

That’s pretty good. I still have to use WHEN to deal with the null case which Haskell seems to auto-magically handle, and I still have to use DESTRUCTURING-BIND explicitly. I think it holds up to Haskell’s qsort.

More Generic

Oh, I know, the Haskell qsort function applies to lists of any Ord type, which is to say, any type that can be ordered (or at least implements the proper interface) (disclaimer: I don’t pretend to know Haskell-speak when it comes to Classes, Types, Typeclasses, etc. I got it wrong, I don’t care.). Common Lisp ships with CLOS which provides the ability to create generic methods, but it doesn’t really ship with many implemented interfaces. That could be considered a shortcoming in the language, but more properly, it is a shortcoming in libraries. You can easily define generic methods and implement them on all the types you might want to sort, even types from other packages and libraries.

(defgeneric lt (some other))

(defmethod lt ((some number) (other number)) (< some other))

(defmethod lt ((some string) (other string)) (string< some other))

(defun qsort (l)
  (when l (destructuring-bind (p . xs) l
            (append (qsort (@ x x xs (lt x p))) (list p)
                    (qsort (@ x x xs (not (lt x p))))))))

(qsort (list "this" "is" "only" "a" "test"))
;;-> ("a" "is" "only" "test" "this")

(qsort (list 12/11 -1.09 3000))
;;-> (-1.09 12/11 3000)

So we can be significantly generic with Common Lisp, too. No, it doesn’t have Haskell’s compile time type-checking goodness (or badness, however you want to slice it), it is a fundamentally dynamically typed language. But I think it matches Haskell in expressivity, and what it lacks in terse syntax, it makes up for with flexibility through macros, generic methods, and dynamism.

In conclusion: I love Haskell, please don’t shoot me. Also, Common Lisp is pretty neat, too. Both are definitely worth learning and excellent tools for functional programming.

Concurrency in Common Lisp With ChanL

Rob Pike did a great talk at Google on concurrency in Newsqueak. The gist of it is that you spawn concurrent tasks in separate lightweight processes and synchronize by communicating through channels. That stuff worked its way into the programming language Go. After seeing that video I wanted to try it out myself in Common Lisp, of course. It turns out there is a Quicklisp-installable library called ChanL that provides just that functionality. I don’t know how lightweight the threads are, but the interface is similar.

A simple, but effective example of using channels is to perform a number of IO-bound tasks in parallel, such as retrieving several web pages. It’s surprisingly simple to do in parallel. To start, I’ll define a little timing macro:

(defmacro time-it (&body body)
  (let ((start-time (gensym)))
    `(let ((,start-time (get-internal-real-time)))
       ,@body
       (format t "Runtime: ~a milliseconds.~%" (- (get-internal-real-time) ,start-time)))))

Then, let’s load the ChanL library and Drakma (for simple HTTP requests) using quicklisp:

(ql:quickload "drakma")
(ql:quickload "chanl")

Our example set of URLs:

(defparameter *urls*
  (list
    "http://blog.thezerobit.com/"
    "http://quicklisp.org/"
    "http://www.cliki.net/index"
    "http://sbcl.org/"
    ))

Now, here’s a function that performs a single HTTP request and times it.

(defun do-request (url)
  (let ((start-time (get-internal-real-time)))
    (format t "Starting request: ~a~%" url)
    (drakma:http-request url)
    (let ((elapsed (- (get-internal-real-time) start-time)))
      (format t "Completed request in ~a ms: ~a~%" elapsed url))))

This is what you might do without concurrency if we wanted to perform these HTTP requests:

(time-it (dolist (url *urls*)
           (do-request url)))
;; Starting request: http://blog.thezerobit.com/
;; Completed request in 616 ms: http://blog.thezerobit.com/
;; Starting request: http://quicklisp.org/
;; Completed request in 819 ms: http://quicklisp.org/
;; Starting request: http://www.cliki.net/index
;; Completed request in 429 ms: http://www.cliki.net/index
;; Starting request: http://sbcl.org/
;; Completed request in 291 ms: http://sbcl.org/
;; Runtime: 2155 milliseconds.

It takes over 2 seconds to make all these requests in serial fashion. There may be a way to queue up these requests in parallel with the Drakma library, but instead we’ll just use the ChanL library. Let’s define a function, like DO-REQUEST that instead of printing its progress will send messages over a channel.

(defun do-request-chan (url chan)
  (let ((start-time (get-internal-real-time)))
    (chanl:send chan (format nil "Starting request: ~a~%" url))
    (drakma:http-request url)
    (let ((elapsed (- (get-internal-real-time) start-time)))
      (chanl:send chan
                  (format nil "Completed request in ~a ms: ~a~%" elapsed url)))))

We can use the PEXEC call to spawn separate processes for each HTTP request and then wait for the 8 messages to come in on the channel we create to synchronize.

(time-it
  (let ((chan (make-instance 'chanl:channel)))
    (dolist (url *urls*)
      (chanl:pexec () (do-request-chan url chan)))
    (dotimes (x 8)
      (format t (chanl:recv chan)))))
;; Starting request: http://blog.thezerobit.com/
;; Starting request: http://quicklisp.org/
;; Starting request: http://sbcl.org/
;; Starting request: http://www.cliki.net/index
;; Completed request in 291 ms: http://blog.thezerobit.com/
;; Completed request in 302 ms: http://www.cliki.net/index
;; Completed request in 306 ms: http://sbcl.org/
;; Completed request in 703 ms: http://quicklisp.org/
;; Runtime: 703 milliseconds.

Here, the whole operation takes only as long as the longest single request and the overhead of spawning separate threads is negligible compared to the time saved by making these requests concurrently. It is important to note that, by default, the channels are not buffered so calls to SEND will block until there’s a RECV called on the same channel in a different thread. ChanL also provides buffered channels and some other goodies which I haven’t touched on here.

Lazy Sequences in Common Lisp

0. Delayed Evaluation with Closures

Last week I expressed some anti-lazy sentiment, or at least it was perceived as such. Really, I am just not a fan of laziness by default. Common Lisp is decidedly not lazy by default, but it is easy to defer or avoid computation with closures. With macros, we can add it to the language in an easy-to-use form.

Common Lisp is not a lazy language, but it is easy enough to add lazy sequences. We can model them after the normal list which is constructed piecemeal with the cons operator. The following recursive function builds a list of numbers counting down to 0 using cons.

(defun countdown (n)
  (if (>= n 0)
    (cons n (countdown (1- n)))))

(countdown 10)
;;--> (10 9 8 7 6 5 4 3 2 1 0)

(reduce #'+ (countdown 100))
;;--> 5050

This is a common pattern, but there are some shortcomings. First of all, it’s not a tail recursive function, so this call will always build the stack since tail call optimization or tail recursion elimination does not apply, and will do so linearly with the argument n. Also, it constructs the entire list. You may want this list of numbers, but you don’t need all of them right away sitting in memory. In the example above, the call to reduce gets a full list of 101 elements, from 100 to 0 before processing. Depending on the length of the sequence, a different algorithm might work better.

One way (of many) to construct lazy sequences in Common Lisp is to store the CDR (second part) of the CONS cell as a thunk, or a closure that takes no argument and computes the next CONS in the list.

(defun lazy-countdown (n)
  (if (>= n 0)
    (cons n (lambda () (lazy-countdown (1- n))))))

(lazy-countdown 10)
;;--> (10 . #<CCL:COMPILED-LEXICAL-CLOSURE (:INTERNAL LAZY-COUNTDOWN) #x30200F59400F>)

We can grab the first element of the lazy list with CAR per normal, but to get the next CONS cell, we need a version of CDR which FUNCALLs the thunk. That will call LAZY-COUNTDOWN which will return another CONS cell.

(defun lazy-cdr (lazylist) (funcall (cdr lazylist)))

(lazy-cdr (lazy-countdown 10))
;;--> (9 . #<CCL:COMPILED-LEXICAL-CLOSURE (:INTERNAL LAZY-COUNTDOWN) #x30200F55B7BF>)

So far, so good. Now we need a version of REDUCE which uses LAZY-CDR instead of CDR, and can reduce a lazy list for us.

(defun lazy-reduce (f lazylist)
  (labels ((inner-reduce (lazylist acc)
             (if lazylist
               (inner-reduce (lazy-cdr lazylist) (funcall f acc (car lazylist)))
               acc)))
    (inner-reduce (lazy-cdr lazylist) (car lazylist))))

(lazy-reduce #'+ (lazy-countdown 1000))
;;--> 500500

1. Use CLOS and Macros for a Nice Interface

In order to use this pattern easily, we will create a nice interface. We’ll package up the lazy list into its own class, which I will call LCONS and use a macro of the same name to construct lazy lists by automatically wrapping the second argument in a thunk.

(defclass lcons ()
  ((val :initarg :val)))

(defmacro lcons (head tail)
  `(make-instance 'lcons
                  :val (cons ,head (lambda () ,tail))))

For our interface, I am going to use methods HEAD, TAIL and EMPTY? to approximate CAR, CDR, and NULL and make them compatible with normal lists.

(defgeneric empty? (lcons))
(defgeneric head (lcons))
(defgeneric tail (lcons))

(defmethod empty? ((lcons lcons))
  nil)

(defmethod empty? ((lcons list))
  (null lcons))

(defmethod head ((lcons lcons))
  (car (slot-value lcons 'val)))

(defmethod head ((lcons list))
  (car lcons))

(defmethod tail ((lcons lcons))
  (funcall (cdr (slot-value lcons 'val))))

(defmethod tail ((lcons list))
  (cdr lcons))

Now, we can redefine LAZY-COUNTDOWN, just the same as the original COUNTDOWN but use LCONS instead of CONS.

(defun countdown (n)
  (if (>= n 0)
    (cons n (countdown (1- n)))))

(car (cdr (countdown 10)))
;;--> 9

(defun lazy-countdown (n)
  (if (>= n 0)
    (lcons n (countdown (1- n)))))

(head (tail (lazy-countdown 10)))
;;--> 9

Now, we can define functions to do things that are normally only for regular lists that work with lazy lists, also. A good candidate is MAPCAR. Here’s a version that works with lazy lists:

(defun lmapcar (f &rest lists)
  (if (notany #'empty? lists)
    (lcons (apply f (mapcar #'head lists))
           (apply #'lmapcar f (mapcar #'tail lists)))))

(head (lmapcar #'+ (lazy-countdown 10) (countdown 10)))
;;--> 20

Notice that LMAPCAR is lazy, too. It returns an LCONS (lazy list). In this way we can build up a collection of functions just like the normal functions on lists in Common Lisp, but ones that take and return lazy lists.

Just like lazy sequences in other languages, these sequences can be infinite. Here’s a function INTEGERS which builds an infinite lazy list of integers.

(defun integers (n) (lcons n (integers (1+ n))))

(head (tail (tail (integers 0))))
;;--> 2

I have encoded this into a project which is available on github: slow-jam . It is the above code with a few extra functions that take and/or return lazy sequences. Here are a few examples of things you can do with the library:

;; infinite sequences that print nicely on the REPL
(range)
;;--> (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ETC...)

(take 10 (range))
;;--> (0 1 2 3 4 5 6 7 8 9)

;; FILTER is lazy
(take 5 (filter #'evenp (range)))
(0 2 4 6 8)

One advantage of this approach is that the tail of the lazy list is never cached which means that, unless you use the TO-LIST function provided in the library to turn a lazy list into a normal list, all the elements of your lazylists (except the first) can be garbage collected. These lists never take much permanent memory. The downside is that results are not cached, and you may end up recalculating the same list elements more than once. For that reason, these lists should be built in using side-effect-free functions. If you do want to cache the contents of a lazy list, just call TO-LIST which returns real list and keep a reference to that.

This is just one implementation of lazy sequences in Common Lisp. It is partially inspired by Clojure, Haskell, and SICP. If you like the slow-jam library and want it to be better, report an issue or make a pull request at github.

Immutable Persistent Data Structures in Common Lisp

0. The Rationale

Clojure, Scala, and Haskell (and other languages) have recently brought the idea of immutable (and persistent) data structures into some amount of popularity. This is not a new idea, by any means, as Lisp has always supported this with its list structure, which is essentially a linked list. In Common Lisp and Scheme, lists are actually mutable but in practice, this is rarely a problem because it is generally considered bad form to go around mutating lists except under certain circumstances. Most Lisp books, tutorials and references encourage using lists in a functional style (that is, assuming and treating them as immutable structures).

In Clojure, the big three data structures are the hashmap, vector, and set, which each come with their own literal syntax. Clojure also has normal lispy lists, too with language enforced immutability. These immutable data structures can be approximated with normal lists in Common Lisp with the caveat that they don’t retain the more efficient performance characteristics of Clojure’s data structures. There are a few libraries for Common Lisp which provide these structures with similar time and space complexity as Clojure’s implementations. The one that I recommend is FSet which, according to the Wayback Machine, has been around since at latest 2007.

What’s the point of learning to use these data structures in Common Lisp? Isn’t Clojure better? Well, in a lot of ways, Clojure is a better language and environment, but in a lot of ways, Common Lisp is better, too. I enjoy using both languages. I’m not quite lucky enough to have a day-job that allows me write much code in a Lisp, so I use it for fun. Therefore, my criteria for languages that are fun tend to push me toward Common Lisp. I generally use SBCL or ClozureCL as implementations which are native (not bound the the JVM) and have a much faster startup time and interact easier with native libraries. I don’t care much for the JVM, and I prefer native libraries to JVM libraries, which are themselves, often just JNI wrappers on native code. Tracebacks in SBCL and CCL are much easier to read than what you get with Clojure. I also prefer non-lazy to lazy.

An aside on laziness: I like the idea of lazy collections, or iterators, or generators, or streams (from SICP). They can be useful for certain constructions, but they don’t really enable anything amazing that you can’t do otherwise, just with a slightly different algorithm. It’s slower. Every implementation of Standard ML (not lazy) I’ve tried has varied from just being somewhat to several times faster than Haskell where all computation is lazy. I think setting up all that delayed computation is expensive, and the only way I found to make Haskell perform within the same ball park as Standard ML, was to add hints here and there to get computations to go ahead and happen instead of building a stack of thunks. I don’t have to encourage Standard ML to do a computation, it just does it. Same with Common Lisp. Life is much simpler and faster when computations just happen and nothing is lazy by default.

So, Clojure has its advantages. One of those is a slight reduction in the number of parens with certain standard Lisp constructions like LET and COND. Also, the collection literals are nice. Clojure is a Lisp-1, so the shared namespace is generally better than Lisp-2’s like Common Lisp. Even so, there are nice things about split function namespace, that is that you don’t have to worry about shadowing built in functions. This is a problem in languages with a single namespece. I don’t know how many programs in Python I’ve seen that shadow the built in “id” function. It turns out that’s a really popular name for a database column, and hence a variable name. In Common Lisp, you can name your variable “list” even through there is a standard function called “list”, none of this “lst” crap.

TLDR; It’s all a matter of taste, but for me Common Lisp is just a little more fun to hack in than Clojure.

1. The List

I’ll try to keep this short. A Common Lisp list is a linked list constructed of CONS cells. Each CONS has two parts the CAR and the CDR. The CAR generally contains a list element and the CDR contains the next CONS cell, or NIL, signifying the end of the list. An empty list is the same as NIL.

;; contruct a list with a single element, the keyword symbol :x
(cons :x nil)
;;--> (:x)

;; alternately:
(list :x)
;;--> (:x)

;; or with quoting (bypass evaluation):
(quote (:x))
;;--> (:x)

;; shorthand for quoting is a single quote mark
'(:x)
;;--> (:x)

;; CAR and CDR extract the cells of the CONS
(car '(:x))
;;--> :x
(cdr '(:x))
;;--> NIL

;; A list of three elements:
(cons :x (cons :y (cons :z nil)))
;;--> (:x :y :z)
;; Also: (list :x :y :z) or '(:x :y :z)

So the way to use lists as an immutable, persistent data structure is that you can construct a new list with an element added to the front of the list without affecting the original, and the “tail” of the list is shared. Also, you can get the list minus the front element, just by taking the CDR of the list.

(let ((our-list (list 1 2 3)))
  (list
    (cons 0 our-list)
    (cons 1 our-list)
    (cdr our-list)
    our-list))
;;--> ((0 1 2 3) (1 1 2 3) (2 3) (1 2 3))

In that example I took a list and first constructed a new list by adding a 0 to the front, then another new list by adding a 1 to the front, then I got a list without the first element, and then the original list, unscathed. No list was mutated and no space was wasted, as all these lists share the same last 2 or 3 CONS cells.

2. Vector / SEQ

Of course, you might want to do something other than have an ordered sequence with the ability to add or remove elements to the front. In Clojure, you’ve got the vector, which is an ordered sequence that allows you to add or remove elements anywhere in the list, and lookup arbitrary elements in O(log n) time and with shared structure. You can do the same thing in Common Lisp with normal lists but some of the operations will take O(n) time complexity and some operations will return a brand new list with no shared structure. Let’s start with that and then show how to get Clojure-style complexity characteristics with the SEQ collection provided by the FSet library. For small enough sequences you might just want to use normal lists, and avoid the FSet dependency.

In none of the following examples is any collection mutated, instead the value of the expression must be captured and used instead of the original, so when I say “drop an element”, I really mean “return a new collection without an element”, etc.

You can remove an element from the end of a list with BUTLAST, a funny sounding function which basically creates a whole new list minus the last element. You can also specify a number of elements to remove from the end of the list.

(butlast (list 1 2 3))
;;--> (1 2)

(butlast (list 1 2 3 4 5) 2)
;;--> (1 2 3)

You can get the NTH element of a list:

(nth 0 (list 1 2 3 4 5))
;;--> 1

You can take a slice of a list with SUBSEQ:

(subseq (list :a :b :c :d) 1 3)
;;--> (:B :C)

Or drop the first n elements with NTHCDR:

(nthcdr 4 (list :a :b :c :d :e :f))
;;--> (:E :F)

If you want to drop or add or change an element from the middle of the list you might have to write or use one these simple functions to do that:

(defun drop-nth (n list)
  (append (subseq list 0 n) (nthcdr (1+ n) list)))

(drop-nth 3 (list :a :b :c :d :e :f))
;;--> (:A :B :C :E :F)

(defun add-nth (n elem list)
  (append (subseq list 0 n) (cons elem (nthcdr n list))))

(add-nth 3 :bar (list :a :b :c :d :e :f))
;;--> (:A :B :C :BAR :D :E :F)

(defun set-nth (n elem list)
  (append (subseq list 0 n) (cons elem (nthcdr (1+ n) list))))

(set-nth 3 :foo (list :a :b :c :d :e))
;;--> (:A :B :C :FOO :E)

Now if you want the tree-structured, persistent version of that, use FSet’s SEQ. If you are using Quicklisp (which I hope you are), you can load up FSet in the REPL like this:

(ql:quickload :fset)
(in-package :fset-user)

There is a function, EMPTY-SEQ that creates an empty SEQ, and there’s a macro SEQ which can be used to construct a SEQ with multiple elements:

(empty-seq)
;;--> #[ ]

(seq :a :b :c)
;;--> #[ :A :B :C ] ;; this is how FSet pretty-prints sequences in the REPL

You can get the first or last or nth element with FIRST, LAST and @:

(first (seq :a :b :c))
;;--> :A

(last (seq :a :b :c))
;;--> :C

(@ (seq :a :b :c) 1)
;;--> :B

You can add or remove elements at the beginning or end of sequence with WITH-FIRST, WITH-LAST, LESS-FIRST, and LESS-LAST:

(with-first (seq :a :b :c) :foo)
;;--> #[ :FOO :A :B :C ]

(with-last (seq :a :b :c) :foo)
;;--> #[ :A :B :C :FOO ]

(less-first (seq :a :b :c))
;;--> #[ :B :C ]

(less-last (seq :a :b :c))
;;--> #[ :A :B ]

You can replace an element, drop an element, or insert an element at a particular index with WITH, INSERT and LESS:

(with (seq :a :b :c) 1 :foo)
;;--> #[ :A :FOO :C ]

(insert (seq :a :b :c) 1 :foo)
;;--> #[ :A :FOO :B :C ]

(less (seq :a :b :c) 1)
;;--> #[ :A :C ]

2. Set

In Common Lisp lists are also used to emulate sets. Not all operations will have the optimum time complexity, but it’s generally adequate for most purposes.

Common Lisp has various equality functions: EQ, EQL, EQUAL, and EQUALP. I will not summarize them here, but for the sake of consistency with Clojure, we’ll probably want to consider two elements to be equal with EQUALP. Many Common Lisp functions take an optional :TEST parameter to specify which equality test function to use. All of the set theory related functions do.

To test if a value is present in a set (really, just a list that we’re treating as a set), there is the MEMBER function. It will return NIL if the value is not present (which is the only “false” value in Common Lisp) or the subset of the list starting with that value (which will be treated as “true” in conditional statements, etc) if it is present.

(member :a (list :a :b :c) :test #'equalp)
;;--> (:A :B :C)
(member :d (list :a :b :c) :test #'equalp)
;;--> NIL
(member '("asdf") (list '("asdf") :b :c) :test #'equalp)
;;--> (("asdf") :B :C)

You can get the union or intersection of two sets with UNION and INTERSECTION:

(union (list :a :b :c) (list :d :e :a) :test #'equalp)
;;--> (:C :B :D :E :A)
(intersection (list :a :b :c) (list :d :e :a :b) :test #'equalp)
;;--> (:B :A)

You can get all the elements from one list that do not exist in another list with SET-DIFFERENCE.

(set-difference (list :a :b :c) (list :d :e :a :b) :test #'equalp)
;;--> (:C)

You can even check if a set is a subset of another set with SUBSETP.

(subsetp (list :a :b :c) (list :d :e :a :b) :test #'equalp)
;;--> NIL

(subsetp (list :a :b :c) (list :d :e :c :a :b) :test #'equalp)
;;--> T

The set of elements that exist in only one of two sets is constructed with SET-EXCLUSIVE-OR.

(set-exclusive-or (list :a :b :c) (list :d :e :a :b) :test #'equalp)
;;--> (:E :D :C)

Adding and removing elements from a list is accomplished with CONS and REMOVE:

(cons :foo (list :a :b))
;;--> (:foo :a :b)

(remove "elem" (list "foo" "bar" "elem") :test #'equalp)
;;--> ("foo" "bar")

It’s clear that Common Lisp was designed to use lists as sets when necessary, but under certain conditions you might want to use a more efficient implementation (say if you wanted to do a lot of membership tests in a larger set, and would benefit from O(log n) membership test instead of O(n)) you can use FSet’s SET. You can construct an empty set with EMPTY-SET or pre-populate with the SET macro (which shadows the archaic and generally unused SET function built into Common Lisp).

(empty-set)
;;--> #{ }

(set :a :b :c)
;;--> #{ :A :B :C }

The FSet library comes with an equality function that is used automatically: EQUAL?. It is slightly better than EQUALP in that it will find two sets equal that are equivalent sets (same for other FSet collections).:

(equalp (list :a :b :c) (list :b :c :a))
NIL

(equal? (set :a :b :c) (set :b :c :a))
T

Membership is tested with CONTAINS?:

(contains? (set :a :b :c) :a)
;;--> T

(contains? (set :a :b :c) :d)
;;--> NIL

UNION, INTERSECTION, and SET-DIFFERENCE work as expected:

(union (set :a :b) (set :b :c))
;;--> #{ :A :B :C }

(intersection (set :a :b) (set :b :c))
;;--> #{ :B }

(set-difference (set :a :b) (set :b :c))
;;--> #{ :A }

Two tests exist: SUBSET? and DISJOINT?.

(subset? (set :a :b) (set :b :c :a))
;;--> T

(subset? (set :d :b) (set :b :c :a))
;;--> NIL

(disjoint? (set :a :b) (set :c :d))
;;--> T

(disjoint? (set :a :b :c) (set :c :d))
;;--> NIL

Adding and removing elements from a SET is accomplished with WITH and LESS:

(with (set :b :c) :a)
;;--> #{ :A :B :C }

(less (set :a :b :c) :a)
;;--> #{ :B :C }

3. Map

There are two ways lists are used in Common Lisp to create mapping collections.

The plist is just a list with every two elements representing a key and a value. It has one useful function, GETF which uses EQ for key equality and no way to use any other comparison, making it only useful for using symbols for keys:

(getf (list :name "Steve" :age 33) :name)
;;--> "Steve"

(getf (list :name "Steve" :age 33) :weight)
;;--> NIL

(getf (list :name "Steve" :age 33) :weight "some default")
;;--> "some default"

Mappings can be changed or added by appending a key/value pair to the front of the list since GETF searches from the front:

(getf (append (list :name "Stephen") (list :name "Steve" :age 33)) :name)
;;--> "Stephen"

You could easily build a repertoire of functions to do all the common operations you might want with plists, but there is another form called the alist which comes with a better set of operations built in. An alist is a list of cons cells with the key in the CAR and value in the CDR. When you have a CONS cell with a non-list in the CDR, it is called a dotted list and looks like this: (car . cdr). Alists can be constructed in a variety of ways:

(list (cons :key "val") (cons :other-key "other-val"))
;;--> ((:KEY . "val") (:OTHER-KEY . "other-val"))

'((:key . "val") (:other-key . "other-val"))
;;--> ((:KEY . "val") (:OTHER-KEY . "other-val"))

(pairlis (list :key1 :key2 :key3) (list "val1" "val2" "val3"))
;;--> ((:KEY3 . "val3") (:KEY2 . "val2") (:KEY1 . "val1"))

The lookup function for alists is ASSOC which takes the :TEST parameter for key equality though it returns the entire matching CONS cell, not just the value. You can use CDR to extract the value.

(assoc :key '((:key . "val") (:other-key . "other-val")) :test #'equalp)
;;--> (:KEY . "val")

(cdr (assoc :other-key '((:key . "val") (:other-key . "other-val")) :test #'equalp))
;;--> "other-val"

(assoc :missing-key '((:key . "val") (:other-key . "other-val")) :test #'equalp)
;;--> NIL

Like plists, key/val pairs can be added to alists by adding them to the front of the list. There is a helper function for this called ACONS:

(acons "foo" "bar" '(("baz" . "val")))
;;--> (("foo" . "bar") ("baz" . "val"))

There’s no builtin function to remove keys from an alist, but we can write one easily enough:

(defun alist-remove (alist key)
  (remove-if #'(lambda (cell) (equalp key (car cell))) alist))

(alist-remove '((:a . 100) (:b . 200)) :a)
;;--> ((:B . 200))

Alists are probably an 80% solution, which is plenty for most situations, but FSet’s maps are quite nice and give you that O(log n) lookup and update time complexity with the structural sharing that we’ve grown to love. I’ve found them as useful as Clojure’s hashmaps. Like the other collections, there’s a function EMPTY-MAP that’s self explanatory and a macro MAP for constructing larger maps:

(empty-map)
;;--> #{| |}

(map (:a 100) (:b 200))
;;--> #{| (:A 100) (:B 200) |}

Like SETs, elements can be added or removed or accessed from FSet MAP collections with WITH, LESS and @.

(with (map (:a 100)) :b 200)
;;--> #{| (:A 100) (:B 200) |}

(less (map (:a 100) (:c 200)) :c)
;;--> #{| (:A 100) |}

(@ (map (:a 100) (:c 200)) :c)
;;--> 200

4. Conclusion w/bonus Feature

I only scratched the surface with the FSet library. There are more functions than the ones I listed and some of the ones I describe actually return multiple values, for example to signify if a key is present in case you are storing NIL as values, etc. I recommend the FSet Tutorial for more details.

Lists in Common Lisp are really quite versatile and can approximate most other structures for limited sizes. The nice thing about the FSet library is that it always puts the collection as the first argument, so it is amenable to Clojure’s threading macro: -> which I will translate here for your benefit into Common Lisp:

(defmacro -> (x &optional (form nil form-supplied-p) &rest more)
  (if form-supplied-p
    (if more
      `(-> (-> ,x ,form) ,@more)
      (if (listp form)
        `(,(car form) ,x ,@(cdr form))
        (list form x)))
    x))

(-> (empty-map)
  (with :a 100)
  (with :b 200)
  (less :a))
;;--> #{| (:B 200) |}