the zero bit stream

programmer things

Immutable Persistent Data Structures in Common Lisp

0. The Rationale

Clojure, Scala, and Haskell (and other languages) have recently brought the idea of immutable (and persistent) data structures into some amount of popularity. This is not a new idea, by any means, as Lisp has always supported this with its list structure, which is essentially a linked list. In Common Lisp and Scheme, lists are actually mutable but in practice, this is rarely a problem because it is generally considered bad form to go around mutating lists except under certain circumstances. Most Lisp books, tutorials and references encourage using lists in a functional style (that is, assuming and treating them as immutable structures).

In Clojure, the big three data structures are the hashmap, vector, and set, which each come with their own literal syntax. Clojure also has normal lispy lists, too with language enforced immutability. These immutable data structures can be approximated with normal lists in Common Lisp with the caveat that they don’t retain the more efficient performance characteristics of Clojure’s data structures. There are a few libraries for Common Lisp which provide these structures with similar time and space complexity as Clojure’s implementations. The one that I recommend is FSet which, according to the Wayback Machine, has been around since at latest 2007.

What’s the point of learning to use these data structures in Common Lisp? Isn’t Clojure better? Well, in a lot of ways, Clojure is a better language and environment, but in a lot of ways, Common Lisp is better, too. I enjoy using both languages. I’m not quite lucky enough to have a day-job that allows me write much code in a Lisp, so I use it for fun. Therefore, my criteria for languages that are fun tend to push me toward Common Lisp. I generally use SBCL or ClozureCL as implementations which are native (not bound the the JVM) and have a much faster startup time and interact easier with native libraries. I don’t care much for the JVM, and I prefer native libraries to JVM libraries, which are themselves, often just JNI wrappers on native code. Tracebacks in SBCL and CCL are much easier to read than what you get with Clojure. I also prefer non-lazy to lazy.

An aside on laziness: I like the idea of lazy collections, or iterators, or generators, or streams (from SICP). They can be useful for certain constructions, but they don’t really enable anything amazing that you can’t do otherwise, just with a slightly different algorithm. It’s slower. Every implementation of Standard ML (not lazy) I’ve tried has varied from just being somewhat to several times faster than Haskell where all computation is lazy. I think setting up all that delayed computation is expensive, and the only way I found to make Haskell perform within the same ball park as Standard ML, was to add hints here and there to get computations to go ahead and happen instead of building a stack of thunks. I don’t have to encourage Standard ML to do a computation, it just does it. Same with Common Lisp. Life is much simpler and faster when computations just happen and nothing is lazy by default.

So, Clojure has its advantages. One of those is a slight reduction in the number of parens with certain standard Lisp constructions like LET and COND. Also, the collection literals are nice. Clojure is a Lisp-1, so the shared namespace is generally better than Lisp-2’s like Common Lisp. Even so, there are nice things about split function namespace, that is that you don’t have to worry about shadowing built in functions. This is a problem in languages with a single namespece. I don’t know how many programs in Python I’ve seen that shadow the built in “id” function. It turns out that’s a really popular name for a database column, and hence a variable name. In Common Lisp, you can name your variable “list” even through there is a standard function called “list”, none of this “lst” crap.

TLDR; It’s all a matter of taste, but for me Common Lisp is just a little more fun to hack in than Clojure.

1. The List

I’ll try to keep this short. A Common Lisp list is a linked list constructed of CONS cells. Each CONS has two parts the CAR and the CDR. The CAR generally contains a list element and the CDR contains the next CONS cell, or NIL, signifying the end of the list. An empty list is the same as NIL.

;; contruct a list with a single element, the keyword symbol :x
(cons :x nil)
;;--> (:x)

;; alternately:
(list :x)
;;--> (:x)

;; or with quoting (bypass evaluation):
(quote (:x))
;;--> (:x)

;; shorthand for quoting is a single quote mark
'(:x)
;;--> (:x)

;; CAR and CDR extract the cells of the CONS
(car '(:x))
;;--> :x
(cdr '(:x))
;;--> NIL

;; A list of three elements:
(cons :x (cons :y (cons :z nil)))
;;--> (:x :y :z)
;; Also: (list :x :y :z) or '(:x :y :z)

So the way to use lists as an immutable, persistent data structure is that you can construct a new list with an element added to the front of the list without affecting the original, and the “tail” of the list is shared. Also, you can get the list minus the front element, just by taking the CDR of the list.

(let ((our-list (list 1 2 3)))
  (list
    (cons 0 our-list)
    (cons 1 our-list)
    (cdr our-list)
    our-list))
;;--> ((0 1 2 3) (1 1 2 3) (2 3) (1 2 3))

In that example I took a list and first constructed a new list by adding a 0 to the front, then another new list by adding a 1 to the front, then I got a list without the first element, and then the original list, unscathed. No list was mutated and no space was wasted, as all these lists share the same last 2 or 3 CONS cells.

2. Vector / SEQ

Of course, you might want to do something other than have an ordered sequence with the ability to add or remove elements to the front. In Clojure, you’ve got the vector, which is an ordered sequence that allows you to add or remove elements anywhere in the list, and lookup arbitrary elements in O(log n) time and with shared structure. You can do the same thing in Common Lisp with normal lists but some of the operations will take O(n) time complexity and some operations will return a brand new list with no shared structure. Let’s start with that and then show how to get Clojure-style complexity characteristics with the SEQ collection provided by the FSet library. For small enough sequences you might just want to use normal lists, and avoid the FSet dependency.

In none of the following examples is any collection mutated, instead the value of the expression must be captured and used instead of the original, so when I say “drop an element”, I really mean “return a new collection without an element”, etc.

You can remove an element from the end of a list with BUTLAST, a funny sounding function which basically creates a whole new list minus the last element. You can also specify a number of elements to remove from the end of the list.

(butlast (list 1 2 3))
;;--> (1 2)

(butlast (list 1 2 3 4 5) 2)
;;--> (1 2 3)

You can get the NTH element of a list:

(nth 0 (list 1 2 3 4 5))
;;--> 1

You can take a slice of a list with SUBSEQ:

(subseq (list :a :b :c :d) 1 3)
;;--> (:B :C)

Or drop the first n elements with NTHCDR:

(nthcdr 4 (list :a :b :c :d :e :f))
;;--> (:E :F)

If you want to drop or add or change an element from the middle of the list you might have to write or use one these simple functions to do that:

(defun drop-nth (n list)
  (append (subseq list 0 n) (nthcdr (1+ n) list)))

(drop-nth 3 (list :a :b :c :d :e :f))
;;--> (:A :B :C :E :F)

(defun add-nth (n elem list)
  (append (subseq list 0 n) (cons elem (nthcdr n list))))

(add-nth 3 :bar (list :a :b :c :d :e :f))
;;--> (:A :B :C :BAR :D :E :F)

(defun set-nth (n elem list)
  (append (subseq list 0 n) (cons elem (nthcdr (1+ n) list))))

(set-nth 3 :foo (list :a :b :c :d :e))
;;--> (:A :B :C :FOO :E)

Now if you want the tree-structured, persistent version of that, use FSet’s SEQ. If you are using Quicklisp (which I hope you are), you can load up FSet in the REPL like this:

(ql:quickload :fset)
(in-package :fset-user)

There is a function, EMPTY-SEQ that creates an empty SEQ, and there’s a macro SEQ which can be used to construct a SEQ with multiple elements:

(empty-seq)
;;--> #[ ]

(seq :a :b :c)
;;--> #[ :A :B :C ] ;; this is how FSet pretty-prints sequences in the REPL

You can get the first or last or nth element with FIRST, LAST and @:

(first (seq :a :b :c))
;;--> :A

(last (seq :a :b :c))
;;--> :C

(@ (seq :a :b :c) 1)
;;--> :B

You can add or remove elements at the beginning or end of sequence with WITH-FIRST, WITH-LAST, LESS-FIRST, and LESS-LAST:

(with-first (seq :a :b :c) :foo)
;;--> #[ :FOO :A :B :C ]

(with-last (seq :a :b :c) :foo)
;;--> #[ :A :B :C :FOO ]

(less-first (seq :a :b :c))
;;--> #[ :B :C ]

(less-last (seq :a :b :c))
;;--> #[ :A :B ]

You can replace an element, drop an element, or insert an element at a particular index with WITH, INSERT and LESS:

(with (seq :a :b :c) 1 :foo)
;;--> #[ :A :FOO :C ]

(insert (seq :a :b :c) 1 :foo)
;;--> #[ :A :FOO :B :C ]

(less (seq :a :b :c) 1)
;;--> #[ :A :C ]

2. Set

In Common Lisp lists are also used to emulate sets. Not all operations will have the optimum time complexity, but it’s generally adequate for most purposes.

Common Lisp has various equality functions: EQ, EQL, EQUAL, and EQUALP. I will not summarize them here, but for the sake of consistency with Clojure, we’ll probably want to consider two elements to be equal with EQUALP. Many Common Lisp functions take an optional :TEST parameter to specify which equality test function to use. All of the set theory related functions do.

To test if a value is present in a set (really, just a list that we’re treating as a set), there is the MEMBER function. It will return NIL if the value is not present (which is the only “false” value in Common Lisp) or the subset of the list starting with that value (which will be treated as “true” in conditional statements, etc) if it is present.

(member :a (list :a :b :c) :test #'equalp)
;;--> (:A :B :C)
(member :d (list :a :b :c) :test #'equalp)
;;--> NIL
(member '("asdf") (list '("asdf") :b :c) :test #'equalp)
;;--> (("asdf") :B :C)

You can get the union or intersection of two sets with UNION and INTERSECTION:

(union (list :a :b :c) (list :d :e :a) :test #'equalp)
;;--> (:C :B :D :E :A)
(intersection (list :a :b :c) (list :d :e :a :b) :test #'equalp)
;;--> (:B :A)

You can get all the elements from one list that do not exist in another list with SET-DIFFERENCE.

(set-difference (list :a :b :c) (list :d :e :a :b) :test #'equalp)
;;--> (:C)

You can even check if a set is a subset of another set with SUBSETP.

(subsetp (list :a :b :c) (list :d :e :a :b) :test #'equalp)
;;--> NIL

(subsetp (list :a :b :c) (list :d :e :c :a :b) :test #'equalp)
;;--> T

The set of elements that exist in only one of two sets is constructed with SET-EXCLUSIVE-OR.

(set-exclusive-or (list :a :b :c) (list :d :e :a :b) :test #'equalp)
;;--> (:E :D :C)

Adding and removing elements from a list is accomplished with CONS and REMOVE:

(cons :foo (list :a :b))
;;--> (:foo :a :b)

(remove "elem" (list "foo" "bar" "elem") :test #'equalp)
;;--> ("foo" "bar")

It’s clear that Common Lisp was designed to use lists as sets when necessary, but under certain conditions you might want to use a more efficient implementation (say if you wanted to do a lot of membership tests in a larger set, and would benefit from O(log n) membership test instead of O(n)) you can use FSet’s SET. You can construct an empty set with EMPTY-SET or pre-populate with the SET macro (which shadows the archaic and generally unused SET function built into Common Lisp).

(empty-set)
;;--> #{ }

(set :a :b :c)
;;--> #{ :A :B :C }

The FSet library comes with an equality function that is used automatically: EQUAL?. It is slightly better than EQUALP in that it will find two sets equal that are equivalent sets (same for other FSet collections).:

(equalp (list :a :b :c) (list :b :c :a))
NIL

(equal? (set :a :b :c) (set :b :c :a))
T

Membership is tested with CONTAINS?:

(contains? (set :a :b :c) :a)
;;--> T

(contains? (set :a :b :c) :d)
;;--> NIL

UNION, INTERSECTION, and SET-DIFFERENCE work as expected:

(union (set :a :b) (set :b :c))
;;--> #{ :A :B :C }

(intersection (set :a :b) (set :b :c))
;;--> #{ :B }

(set-difference (set :a :b) (set :b :c))
;;--> #{ :A }

Two tests exist: SUBSET? and DISJOINT?.

(subset? (set :a :b) (set :b :c :a))
;;--> T

(subset? (set :d :b) (set :b :c :a))
;;--> NIL

(disjoint? (set :a :b) (set :c :d))
;;--> T

(disjoint? (set :a :b :c) (set :c :d))
;;--> NIL

Adding and removing elements from a SET is accomplished with WITH and LESS:

(with (set :b :c) :a)
;;--> #{ :A :B :C }

(less (set :a :b :c) :a)
;;--> #{ :B :C }

3. Map

There are two ways lists are used in Common Lisp to create mapping collections.

The plist is just a list with every two elements representing a key and a value. It has one useful function, GETF which uses EQ for key equality and no way to use any other comparison, making it only useful for using symbols for keys:

(getf (list :name "Steve" :age 33) :name)
;;--> "Steve"

(getf (list :name "Steve" :age 33) :weight)
;;--> NIL

(getf (list :name "Steve" :age 33) :weight "some default")
;;--> "some default"

Mappings can be changed or added by appending a key/value pair to the front of the list since GETF searches from the front:

(getf (append (list :name "Stephen") (list :name "Steve" :age 33)) :name)
;;--> "Stephen"

You could easily build a repertoire of functions to do all the common operations you might want with plists, but there is another form called the alist which comes with a better set of operations built in. An alist is a list of cons cells with the key in the CAR and value in the CDR. When you have a CONS cell with a non-list in the CDR, it is called a dotted list and looks like this: (car . cdr). Alists can be constructed in a variety of ways:

(list (cons :key "val") (cons :other-key "other-val"))
;;--> ((:KEY . "val") (:OTHER-KEY . "other-val"))

'((:key . "val") (:other-key . "other-val"))
;;--> ((:KEY . "val") (:OTHER-KEY . "other-val"))

(pairlis (list :key1 :key2 :key3) (list "val1" "val2" "val3"))
;;--> ((:KEY3 . "val3") (:KEY2 . "val2") (:KEY1 . "val1"))

The lookup function for alists is ASSOC which takes the :TEST parameter for key equality though it returns the entire matching CONS cell, not just the value. You can use CDR to extract the value.

(assoc :key '((:key . "val") (:other-key . "other-val")) :test #'equalp)
;;--> (:KEY . "val")

(cdr (assoc :other-key '((:key . "val") (:other-key . "other-val")) :test #'equalp))
;;--> "other-val"

(assoc :missing-key '((:key . "val") (:other-key . "other-val")) :test #'equalp)
;;--> NIL

Like plists, key/val pairs can be added to alists by adding them to the front of the list. There is a helper function for this called ACONS:

(acons "foo" "bar" '(("baz" . "val")))
;;--> (("foo" . "bar") ("baz" . "val"))

There’s no builtin function to remove keys from an alist, but we can write one easily enough:

(defun alist-remove (alist key)
  (remove-if #'(lambda (cell) (equalp key (car cell))) alist))

(alist-remove '((:a . 100) (:b . 200)) :a)
;;--> ((:B . 200))

Alists are probably an 80% solution, which is plenty for most situations, but FSet’s maps are quite nice and give you that O(log n) lookup and update time complexity with the structural sharing that we’ve grown to love. I’ve found them as useful as Clojure’s hashmaps. Like the other collections, there’s a function EMPTY-MAP that’s self explanatory and a macro MAP for constructing larger maps:

(empty-map)
;;--> #{| |}

(map (:a 100) (:b 200))
;;--> #{| (:A 100) (:B 200) |}

Like SETs, elements can be added or removed or accessed from FSet MAP collections with WITH, LESS and @.

(with (map (:a 100)) :b 200)
;;--> #{| (:A 100) (:B 200) |}

(less (map (:a 100) (:c 200)) :c)
;;--> #{| (:A 100) |}

(@ (map (:a 100) (:c 200)) :c)
;;--> 200

4. Conclusion w/bonus Feature

I only scratched the surface with the FSet library. There are more functions than the ones I listed and some of the ones I describe actually return multiple values, for example to signify if a key is present in case you are storing NIL as values, etc. I recommend the FSet Tutorial for more details.

Lists in Common Lisp are really quite versatile and can approximate most other structures for limited sizes. The nice thing about the FSet library is that it always puts the collection as the first argument, so it is amenable to Clojure’s threading macro: -> which I will translate here for your benefit into Common Lisp:

(defmacro -> (x &optional (form nil form-supplied-p) &rest more)
  (if form-supplied-p
    (if more
      `(-> (-> ,x ,form) ,@more)
      (if (listp form)
        `(,(car form) ,x ,@(cdr form))
        (list form x)))
    x))

(-> (empty-map)
  (with :a 100)
  (with :b 200)
  (less :a))
;;--> #{| (:B 200) |}