Purely Functional Data Structures

Author: Chris Okasaki
All Stack Overflow 46
This Year Hacker News 3
This Month Hacker News 1


by davidcuddeback   2017-12-12
> A data structure cannot be functional.

I think the author is using the term "purely functional data structure" to mean a data structure that lends itself to an implementation in a purely functional language [1,2].

[1] https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...

by axiomabsolute   2017-08-20
The canonical text discussing how this is achieved refers to data structures optimized for immutability as "purely functional data structures". [1] These type of data structures are the default for Clojure as well, and there are a number of blog posts and presentations discussing their implementation. [2] [3]

It's a reasonably complicated topic, but the basic idea is that since the data structures are immutable, much of their structure can be shared between versions with few changes. Most of these data structures end up using a Tree of some sort. Performance characteristics can be influenced to an extent by using bucketing to control the width/height of the tree.

[1] : Purely Functional Data Structures (Amazon: https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...) [2] http://hypirion.com/musings/understanding-persistent-vector-... [3] https://www.youtube.com/watch?v=7BFF50BHPPo

by anonymous   2017-08-20

This article, Haskell: Queues without pointers, describes a purely functional queue with O(1) amortized cost (edit: for adding and removing elements). I think the data structure comes from Chris Okasaki and more details are in his book.

The basic idea is to decompose the queue into two lists, one for the front and one for the back. New elements are added to "front". "Back" is stored in reverse order, to facilitate popping elements. When all elements of "back" are gone, "front" is reversed and re-identified as "back". This data structure has O(1) amortized cost for these operations, but apparently with some work it can be reduced to O(1), proper.

Edit: Okasaki's paper describes an elegant, purely functional implementation of queues and double-ended queues (deques). Deques allow adding or removing elements from either end. All such operations are O(1), worst case.

by anonymous   2017-08-20

First of all, don't do premature optimisations. Have you measured your code? Maybe there is some concrete bottlenecks?

Since most of the objects consist of smaller objects joined via data stuctures, I think you can get rid of this problem by using persistent data structures .

persistent data structure is a data structure which always preserves the previous version of itself when it is modified; such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure

Here is wonderful talk about some of them by Daniel Spiewak. If you want more, take a look on Purely Functional Data Structures by Chris Okasaki.

by anonymous   2017-08-20

You can use a purely functional map implementation, where you just ignore the values.

See http://hackage.haskell.org/packages/archive/containers/ (linked to from https://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki ).

(sidenote: For more information on functional datastructures, see http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504 )

by anonymous   2017-08-20

I don't think there's a shortcut here.

Just as you imply, Clojure's persistent data structures are quite a different thing from the immutable collections classes that Cocoa provides.

If you want to use Clojure's persistent data structures from Obj-C, the only way to do so is to re-implement them in Objective-C. My understand is that many of these are described in Okasaki's book, Purely Functional Data Structures, and in the papers of Phil Bagwell.

This other answer has some links: What is the data structure behind Clojure's sets?.

by anonymous   2017-08-20

A related question was asked before: "Does functional programming replace GoF design patterns", with great responses.

The equivalent of "design patterns" is very vague in FP. In general, every time you see a "pattern" in your code you should create something to cover all of its uses in a uniform way. Often it will be a higher-order function.

For example, the following C code

for (int i = 0; i < n; i++)
  if (a[i] == 42)
    return true;
return false;

can be thought of some basic "design pattern" - checking if there's some special element on the list. This snippet could appear many times in code with different conditions. In FP, you simply use a higher order function several times. It's not a "pattern" anymore.

Functional programming has its own practices, but they are much different from "design patterns" in OOP. They include use of polymorphism, lists, higher-order functions, immutability/purity, laziness [not all are essential or specific to FP]... See also "what are core concepts of FP". Also, type classes (Haskell), modules and functors (OCaml), continuations, monads, zippers, finger trees, monoids, arrows, applicative functors, monad transformers, many purely functional data structures (book) etc. Functional pearls, already mentioned by Randall Schulz, form a very rich resource of FP at its best.

To learn how to write idiomatic code, any book/resource on a functional programming language will suffice IMHO (for example, RWH and LYAH); differences between thinking imperatively and functionally are always explained there.

In dynamic languages, Jeff Foster's link is a good collection; here is a very clever use of memoization in JavaScript that could be considered a "design pattern".

by anonymous   2017-08-20

You could try references to his book by Haskell or Clojure folk rather than just the CMU pdf : e.g.,


There was a question here on SO at :

What is the benefit of purely functional data structure?

There is also Clojure area this :


And there was this on SE :


Hope something there provides a basis for a search that bears results :-)

You may have to use an academic or biz ref search engine and you may want to look at poster sessions at a conf because search is not obvious here, e.g., Mercury can generate Erlang code ... so searching caching and locality with respect to performance in functional programming in some hardware area dealing with latency.

Canada'a National Research Council (NRC) had some work going on ... you could try a search of their pub's/notices/reports

But note: a search with

bigdata latency locality NRC 2012

gives rather different result from

bigdata functional latency locality NSF 2012

( and I would next drop the 2012 and try using the google search tool date range option for recent results)

by anonymous   2017-08-20

This would be a long answer, so please let me apologize in advance for pointing you towards a book and not directly answering. I highly recommend looking at Purely Functional Data Structures which is (legally) available as a PDF from the author. Though it's a good book to have in print/ebook anyway.

and the super short answer is use Clojure's built in sorted-map if you want this in practice (though writing your own will of course get nerd-street-cred) because sorted maps use a binary red-black-tree under the hood

by anonymous   2017-08-20

It's because of immutable state. A list is an object + a pointer, so if we imagined a list as a Tuple it might look like this:

let tupleList = ("a", ("b", ("c", [])))

Now let's get the first item in this "list" with a "head" function. This head function takes O(1) time because we can use fst:

> fst tupleList

If we want to swap out the first item in the list with a different one we could do this:

let tupleList2 = ("x",snd tupleList)

Which can also be done in O(1). Why? Because absolutely no other element in the list stores a reference to the first entry. Because of immutable state, we now have two lists, tupleList and tupleList2. When we made tupleList2 we didn't copy the whole list. Because the original pointers are immutable we can continue to reference them but use something else at the start of our list.

Now let's try to get the last element of our 3 item list:

> snd . snd $ fst tupleList

That happened in O(3), which is equal to the length of our list i.e. O(n).

But couldn't we store a pointer to the last element in the list and access that in O(1)? To do that we would need an array, not a list. An array allows O(1) lookup time of any element as it is a primitive data structure implemented on a register level.

(ASIDE: If you're unsure of why we would use a Linked List instead of an Array then you should do some more reading about data structures, algorithms on data structures and Big-O time complexity of various operations like get, poll, insert, delete, sort, etc).

Now that we've established that, let's look at concatenation. Let's concat tupleList with a new list, ("e", ("f", [])). To do this we have to traverse the whole list just like getting the last element:

tupleList3 = (fst tupleList, (snd $ fst tupleList, (snd . snd $ fst tupleList, ("e", ("f", [])))

The above operation is actually worse than O(n) time, because for each element in the list we have to re-read the list up to that index. But if we ignore that for a moment and focus on the key aspect: in order to get to the last element in the list, we must traverse the entire structure.

You may be asking, why don't we just store in memory what the last list item is? That way appending to the end of the list would be done in O(1). But not so fast, we can't change the last list item without changing the entire list. Why?

Let's take a stab at how that might look:

data Queue a = Queue { last :: Queue a, head :: a, next :: Queue a} | Empty
appendEnd :: a -> Queue a -> Queue a
appendEnd a2 (Queue l, h, n) = ????

IF I modify "last", which is an immutable variable, I won't actually be modifying the pointer for the last item in the queue. I will be creating a copy of the last item. Everything else that referenced that original item, will continue referencing the original item.

So in order to update the last item in the queue, I have to update everything that has a reference to it. Which can only be done in optimally O(n) time.

So in our traditional list, we have our final item:

List a []

But if we want to change it, we make a copy of it. Now the second last item has a reference to an old version. So we need to update that item.

List a (List a [])

But if we update the second last item we make a copy of it. Now the third last item has an old reference. So we need to update that. Repeat until we get to the head of the list. And we come full circle. Nothing keeps a reference to the head of the list so editing that takes O(1).

This is the reason that Haskell doesn't have Doubly Linked Lists. It's also why a "Queue" (or at least a FIFO queue) can't be implemented in a traditional way. Making a Queue in Haskell involves some serious re-thinking of traditional data structures.

If you become even more curious about how all of this works, consider getting the book Purely Funtional Data Structures.

EDIT: If you've ever seen this: http://visualgo.net/list.html you might notice that in the visualization "Insert Tail" happens in O(1). But in order to do that we need to modify the final entry in the list to give it a new pointer. Updating a pointer mutates state which is not allowed in a purely functional language. Hopefully that was made clear with the rest of my post.