This post is sort of a continuation of the Comonad Tutorial, and we can call this “part 3”. I’m going to assume the reader has a basic familiarity with comonads.
At work, we develop and use a Scala library called Quiver for working with graphs). In this library, a graph is a recursively defined immutable data structure. A graph, with node IDs of type V
, node labels N
, and edge labels E
, is constructed in one of two ways. It can be empty:
1


Or it can be of the form c & g
, where c
is the context of one node of the graph and g
is the rest of the graph with that node removed:
1 2 3 4 5 6 7 8 

By the same token, we can decompose a graph on a particular node:
1 2 3 4 

Where a GDecomp
is a Context
for the node v
(if it exists in the graph) together with the rest of the graph:
1


Let’s say we start with a graph g
, like this:
I’m using an undirected graph here for simplification. An undirected graph is one in which the edges don’t have a direction. In Quiver, this is represented as a graph where the “in” edges of each node are the same as its “out” edges.
If we decompose on the node a
, we get a view of the graph from the perspective of a
. That is, we’ll have a Context
letting us look at the label, vertex ID, and edges to and from a
, and we’ll also have the remainder of the graph, with the node a
“broken off”:
Quiver can arbitrarily choose a node for us, so we can look at the context of some “first” node:
1 2 3 

We can keep decomposing the remainder recursively, to perform an arbitrary calculation over the entire graph:
1 2 3 4 

The implementation of fold
will be something like:
1 2 3 

For instance, if we wanted to count the edges in the graph g
, we could do:
1 2 3 

The recursive decomposition will guarantee that our function doesn’t see any given edge more than once. For the graph g
above, (g fold b)(f)
would look something like this:
Let’s now say that we wanted to find the maximum degree of a graph. That is, find the highest number of edges to or from any node.
A first stab might be:
1 2 3 4 5 

But that would get the incorrect result. In our graph g
above, the nodes b
, d
, and f
have a degree of 3, but this fold would find the highest degree to be 2. The reason is that once our function gets to look at b
, its edge to a
has already been removed, and once it sees f
, it has no edges left to look at.
This was the issue that came up at work. This behaviour of fold
is both correct and useful, but it can be surprising. What we might expect is that instead of receiving successive decompositions, our function sees “all rotations” of the graph through the decomp
operator:
That is, we often want to consider each node in the context of the entire graph we started with. In order to express that with fold
, we have to decompose the original graph at each step:
1 2 3 4 5 6 7 

But what if we could have a combinator that labels each node with its context?
1


Visually, that looks something like this:
If we now fold over contextGraph(g)
rather than g
, we get to see the whole graph from the perspective of each node in turn. We can then write the maxDegree
function like this:
1 2 3 4 

This all sounds suspiciously like a comonad! Of course, Graph
itself is not a comonad, but GDecomp
definitely is. The counit
just gets the label of the node that’s been decomp
ed out:
1 2 3 4 5 6 7 

The cobind
can be implemented in one of two ways. There’s the “successive decompositions” version:
1 2 3 4 5 6 7 

Visually, it looks like this:
It exposes the substructure of the graph by storing it in the labels of the nodes. It’s very much like the familiar NonEmptyList
comonad, which replaces each element in the list with the whole sublist from that element on.
So this is the comonad of recursive folds over a graph. Really its action is the same as as just fold
. It takes a computation on one decomposition of the graph, and extends it to all subdecompositions.
But there’s another, comonad that’s much more useful as a comonad. That’s the comonad that works like contextGraph
from before, except instead of copying the context of a node into its label, we copy the whole decomposition; both the context and the remainder of the graph.
That one looks visually more like this:
Its cobind
takes a computation focused on one node of the graph (that is, on a GDecomp
), repeats that for every other decomposition of the original graph in turn, and stores the results in the respective node labels:
1 2 3 4 5 6 7 8 

This is useful for algorithms where we want to label every node with some information computed from its neighborhood. For example, some clustering algorithms start by assigning each node its own cluster, then repeatedly joining nodes to the most popular cluster in their immediate neighborhood, until a fixed point is reached.
As a simpler example, we could take the average value for the labels of neighboring nodes, to apply something like a lowpass filter to the whole graph:
1 2 3 4 5 6 7 

The difference between these two comonad instances is essentially the same as the difference between NonEmptyList
and the nonempty list Zipper
.
It’s this latter “decomp zipper” comonad that I decided to ultimately include as the Comonad
instance for quiver.GDecomp
.
Let’s take an example. There is a category of monoids Mon with monoids as objects and monoid homomorphisms as arrows between them. Then there is a functor from Set to Mon that takes any ordinary type A
to the free monoid generated by A
. This is just the List[A]
type together with concatenation as the multiplication and the empty list as the identity element.
This free functor has a right adjoint that takes any monoid M
in Mon to its underlying set M
. That is, this right adjoint “forgets” that M
is a monoid, leaving us with just an ordinary type.
If we compose these two functors, we get a monad. If we start with a type A
, get its free monoid (the List[A]
monoid), and then go from there to the underlying type of the free monoid, we end up with the type List[A]
. The unit
of our adjunction is then a function from any given type A
to the type List[A]
.
1


But then what is the counit
? Remember that for any adjunction, we can compose the functors one way to get a monad, and compose them the other way to get a comonad.
In that case we have to start with a monoid M
, then “forget”, giving us the plain type M
. Then we take the free monoid of that to end up with the List[M]
monoid.
But notice that we are now in the monoid category. In that category, List
is a comonad. And since we’re in the category of monoids, the counit
has to be a monoid homomorphism. It goes from the free monoid List[A]
to the monoid A
:
1 2 

If we apply the counit
for this comonad to the free monoid, we get the join
for our monad:
1


And to get the duplicate
or extend
operation in the comonad, we just turn the crank on the adjunction:
1 2 

The duplicate
just puts each element into its own sublist. With regard to extend
, this just means that given any catamorphism on List
, we can turn that into a homomorphism on free monoids.
1 2 

All the interesting parts of List
are the parts that make it a monoid, and our comonad here is already in a category full of monoids. Therefore the coKleisli composition in this comonad is kind of uninteresting. All it’s saying is that if we can fold a List[A]
to a B
, and a List[B]
to a C
, then we can fold a List[A]
to a C
, by considering each element as a singleton list.
Let’s now consider another category, call it End(Set), which is the category of endofunctors in Set.
The arrows in this category are natural transformations:
1 2 3 

There’s another category, Com, which is the category of comonads on Set. The arrows here are comonad homomorphisms. A comonad homomorphism from F
to G
is a natural transformation f: F ~> G
satisfying the homomorphism law:
1


There is a forgetful functor Forget: Com > End(Set)
that takes a comonad to its underlying endofunctor (forgetting that it’s a comonad). And this functor has a right adjoint Cofree: End(Set) > Com
which generates a cofree comonad on a given endofunctor F
. This is the following data type:
1


Note that not only is the endofunctor Cofree[F,?]
a comonad (in Set) for any functor F
, but the higherorder type constructor Cofree
is itself is a comonad in the endofunctor category. It’s this latter comonad that is induced by the Forget ⊣ Cofree
adjunction. That is, we start at an endofunctor F
, then go to comonads via Cofree[F,?]
, then back to endofunctors via Forget
.
The unit
for this adjunction is then a comonad homomorphism. Remember, this is the unit
for a monad in the category Com of comonads:
1 2 3 

This will start with a value of type F[A]
in the comonad F
, and then unfold an F
branching stream from it. Note that the first level of this will have the same structure as x
.
If we take unit
across to the End(Set) category, we get the duplicate
for our comonad:
1 2 

Note that this is not the duplicate
for the Cofree[F,?]
comonad. It’s the duplicate for Cofree
itself which is a comonad in an endofunctor category.
Turning the crank on the adjunction, the counit
for this comonad now has to be the inverse of our unit
. It takes the heads of all the branches of the given F
branching stream.
1 2 3 

Sending that over to the comonad category, we get the join
for our monad:
1 2 

Since a comonad has to have a counit
, it must be “pointed” or nonempty in some sense. That is, given a value of type W[A]
for some comonad W
, we must be able to get a value of type A
out.
The identity comonad is a simple example of this. We can always get a value of type A
out of Id[A]
. A slightly more interesting example is that of nonempty lists:
1


So a nonempty list is a value of type A
together with either another list or None
to mark that the list has terminated. Unlike the traditional List
data structure, we can always safely get the head
.
But what is the comonadic duplicate
operation here? That should allow us to go from NEL[A]
to NEL[NEL[A]]
in such a way that the comonad laws hold. For nonempty lists, an implementation that satisfies those laws turns out to be:
1 2 3 4 5 6 

The tails
operation returns a list of all the suffixes of the given list. This list of lists is always nonempty, because the first suffix is the list itself. For example, if we have the nonempty list [1,2,3]
(to use a more succinct notation), the tails
of that will be [[1,2,3], [2,3], [3]]
To get an idea of what this means in the context of a comonadic program, think of this in terms of coKleisli composition, or extend
in the comonad:
1 2 

When we map
over tails
, the function f
is going to receive each suffix of the list in turn. We apply f
to each of those suffixes and collect the results in a (nonempty) list. So [1,2,3].extend(f)
will be [f([1,2,3]), f([2,3]), f([3])]
.
The name extend
refers to the fact that it takes a “local” computation (here a computation that operates on a list) and extends that to a “global” computation (here over all suffixes of the list).
Or consider this class of nonempty trees (often called Rose Trees):
1


A tree of this sort has a value of type A
at the tip, and a (possibly empty) list of subtrees underneath. One obvious use case is something like a directory structure, where each tip
is a directory and the corresponding sub
is its subdirectories.
This is also a comonad. The counit
is obvious, we just get the tip
. And here’s a duplicate
for this structure:
1 2 

Now, this obviously gives us a tree of trees, but what is the structure of that tree? It will be a tree of all the subtrees. The tip
will be this
tree, and the tip
of each proper subtree under it will be the entire subtree at the corresponding point in the original tree.
That is, when we say t.duplicate.map(f)
(or equivalently t extend f
), our f
will receive each subtree of t
in turn and perform some calculation over that entire subtree. The result of the whole expression t extend f
will be a tree mirroring the structure of t
, except each node will contain f
applied to the corresponding subtree of t
.
To carry on with our directory example, we can imagine wanting a detailed space usage summary of a directory structure, with the size of the whole tree at the tip
and the size of each subdirectory underneath as tips of the subtrees, and so on. Then d extend size
creates the tree of sizes of recursive subdirectories of d
.
You may have noticed that the implementations of duplicate
for rose trees and tails
for nonempty lists were basically identical. The only difference is that one is mapping over a List
and the other is mapping over an Option
. We can actually abstract that out and get a comonad for any functor F
:
1 2 3 4 

A really common kind of structure is something like the type Cofree[Map[K,?],A]
of trees where the counit
is some kind of summary and each key of type K
in the Map
of subtrees corresponds to some drilldown for more detail. This kind of thing appears in portfolio management applications, for example.
Compare this structure with the free monad:
1 2 3 

While the free monad is either an A
or a recursive step suspended in an F
, the cofree comonad is both an A
and a recursive step suspended in an F
. They really are duals of each other in the sense that the monad is a coproduct and the comonad is a product.
Given this difference, we can make some statements about what it means:
Free[F,A]
is a type of “leafy tree” that branches according to F
, with values of type A
at the leaves, while Cofree[F,A]
is a type of “nodevalued tree” that branches according to F
with values of type A
at the nodes.Exp
defines the structure of some expression language, then Free[Exp,A]
is the type of abstract syntax trees for that language, with free variables of type A
, and monadic bind
literally binds expressions to those variables. Dually, Cofree[Exp,A]
is the type of closed exresspions whose subexpressions are annotated with values of type A
, and comonadic extend
reannotates the tree. For example, if you have a type inferencer infer
, then e extend infer
will annotate each subexpression of e
with its inferred type.This comparison of Free
and Cofree
actually says something about monads and comonads in general:
M
, if f: A => M[B]
, then xs map f
allows us to take the values at the leaves (a:A
) of a monadic structure xs
and substitute an entire structure (f(a)
) for each value. A subsequent join
then renormalizes the structure, eliminating the “seams” around our newly added substructures. In a comonad W
, xs.duplicate
denormalizes, or exposes the substructure of xs:W[A]
to yield W[W[A]]
. Then we can map a function f: W[A] => B
over that to get a B
for each part of the substructure and redecorate the original structure with those values. (See Uustalu and Vene’s excellent paper The Dual of Substitution is Redecoration for more on this connection.)If we look at a Kleisli arrow in the Reader[R,?]
comonad, it looks like A => Reader[R,B]
, or expanded out: A => R => B
. If we uncurry that, we get (A, R) => B
, and we can go back to the original by currying again. But notice that a value of type (A, R) => B
is a coKleisli arrow in the Coreader
comonad! Remember that Coreader[R,A]
is really a pair (A, R)
.
So the answer to the question of how Reader
and Coreader
are related is that there is a onetoone correspondence between a Kleisli arrow in the Reader
monad and a coKleisli arrow in the Coreader
comonad. More precisely, the Kleisli category for Reader[R,?]
is isomorphic to the coKleisli category for Coreader[R,?]
. This isomorphism is witnessed by currying and uncurrying.
In general, if we have an isomorphism between arrows like this, we have what’s called an adjunction:
1 2 3 4 

In an Adjunction[F,G]
, we say that F
is left adjoint to G
, often expressed with the notation F ⊣ G
.
We can clearly make an Adjunction
for Coreader[R,?]
and Reader[R,?]
by using curry
and uncurry
:
1 2 3 4 5 6 

The additional tupled
and untupled
come from the unfortunate fact that I’ve chosen Scala notation here and Scala differentiates between functions of two arguments and functions of one argument that happens to be a pair.
So a more succinct description of this relationship is that Coreader
is left adjoint to Reader
.
Generally the left adjoint functor adds structure, or is some kind of “producer”, while the right adjoint functor removes (or “forgets”) structure, or is some kind of “consumer”.
An interesting thing about adjunctions is that if you have an adjoint pair of functors F ⊣ G
, then F[G[?]]
always forms a comonad, and G[F[?]]
always forms a monad, in a completely canonical and amazing way:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Note that this says something about monads and comonads. Since the left adjoint F
is a producer and the right adjoint G
is a consumer, a monad always consumes and then produces, while a comonad always produces and then consumes.
Now, if we compose Reader
and Coreader
, which monad do we get?
1 2 

That’s the State[S,?]
monad!
Now if we compose it the other way, we should get a comonad:
1 2 

What is that? It’s the Store[S,?]
comonad:
1 2 3 4 5 6 7 8 9 10 11 

This models a “store” of values of type A
indexed by the type S
. We have the ability to directly access the A
value under a given S
using peek
, and there is a distinguished cursor
or current position. The comonadic extract
just reads the value under the cursor
, and duplicate
gives us a whole store full of stores such that if we peek
at any one of them, we get a Store
whose cursor
is set to the given s
. We’re defining a seek(s)
operation that moves the cursor
to a given position s
by taking advantage of duplicate
.
A use case for this kind of structure might be something like image processing or cellular automata, where S
might be coordinates into some kind of space (like a twodimensional image). Then extend
takes a local computation at the cursor
and extends it to every point in the space. For example, if we have an operation average
that peeks at the cursor
’s immediate neighbors and averages them, then we can apply a lowpass filter to the whole image with image.extend(average)
.
The type A => Store[S,B]
is also one possible representation of a Lens. I might talk about lenses and zippers in a future post.
If all monads are given by composing adjoint pairs of functors, what adjoint pair of functors forms the `Reader` monad? And if we compose those functors the other way, which comonad do we get?
Shachaf BenKiki pointed out on IRC that there are at least two ways of doing this. One is via the Kleisli construction and the other is via the EilenbergMoore construction. Dr Eugenia Cheng has a fantastic set of videos explaining these constructions. She talks about how for any monad T
there is a whole category Adj(T)
of adjunctions that give rise to T
(with categories as objects and adjoint pairs of functors as the arrows), and the Kleisli category is the initial object in this category while the EilenbergMoore category is the terminal object.
So then, searching around for an answer to what exactly the EilenbergMoore category for the R => ?
monad looks like (I think it’s just values of type R
and functions between them), I came across this Mathematics Stack Exchange question, whose answer more or less directly addresses my original question above. The adjunction is a little more difficult to see than the initial/terminal ones, but it’s somewhat interesting, and what follows is an outline of how I convinced myself that it works.
Let’s consider the reader monad R => ?
, which allows us to read a context of type R
.
The first category involved is Set (or Hask, or Scala). This is just the familiar category where the objects are types (A
,B
,C
, etc.) and the arrows are functions.
The other category is Set/R, which is the slice category of Set over the type R
. This is a category whose objects are functions to R
. So an object x
in this category is given by a type A
together with a function of type A => R
. An arrow from x: A => R
to y: B => R
is given by a function f: A => B
such that y(f(a)) = x(a)
for all a:A
.
The left adjoint is R*
, a functor from Set to Set/R. This functor sends each type A
to the function (p:(R,A)) => p._1
, having type (R,A) => R
.
1


The right adjoint is Π_R
, a functor from Set/R to Set. This functor sends each object q: A => R
in Set/R to the set of functions R => A
for which q
is an inverse. This is actually a dependent type inhabited by functions p: R => A
which satisfy the identity q(p(a)) = a
for all a:A
.
The monad is not exactly easy to see, but if everything has gone right, we should get the R => ?
reader monad by composing Π_R
with R*
.
We start with a type A
. Then we do R*
, which gives us the object rStar[A]
in the slice category, which you will recall is just _._1
of type (R,A) => R
. Then we go back to types via Π_R(rStar[A])
which gives us a dependent type P
inhabited by functions p: R => (R,A)
. Now, this looks a lot like an action in the State
monad. But it’s not. These p
must satisfy the property that _1
is their inverse. Which means that the R
they return must be exactly the R
they were given. So it’s like a State
action that is read only. We can therefore simplify this to the ordinary (nondependent) type R => A
. And now we have our Reader
monad.
But what about the other way around? What is the comonad constructed by composing R*
with Π_R
? Well, since we end up in the slice category, our comonad is actually in that category rather than in Set.
We start with an object q: A => R
in the slice category. Then we go to types by doing Π_R(q)
. This gives us a dependent type P_A
which is inhabited by all p: R => A
such that q
is their inverse. Then we take rStar[Π_R(q)]
to go back to the slice category and we find ourselves at an object f: (R, Π_R(q)) => R
, which you’ll recall is implemented as _._1
. As an endofunctor in Set/R, λq. rStar[Π_R(q)]
takes all q: A => R
to p: (R, R => A) => R = _._1
such that p
is only defined on R => A
arguments whose inverse is q
.
That is, the counit for this comonad on elements y: A => R
must be a function counit: (R, Π_R(y)) => A
such that for _._1: (R, Π_R(y)) => R
, the property y compose counit = _._1
holds. Note that this means that the R
returned by _._1
and the R
returned by y
must be the same. Recall that _._1
always returns the first element of its argument, and also recall that the functions in Π_R(y)
must have y
as their inverse, so they’re only defined at the first element of the argument to _._1
. That is p._2(x)
is only defined when x = p._1
.
If we try to encode that in Scala (ignoring all the “such that”), we get something like:
1 2 

This looks a lot like a counit
for the Store
comonad! Except what we constructed is not that. Because of the additional requirements imposed by our functors and by the slice category, the second element of p
can only take an argument that is exactly the first element of p
. So we can simplify that to (R, () => A)
or just (R, A)
. And we now have the familiar Coreader
comonad.
Today I want to talk about comonads, which are the dual of monads. The utility of comonads in everyday life is not quite as immediately obvious as that of monads, but they definitely come in handy sometimes. Particularly in applications like image processing and scientific computation.
Let’s remind ourselves of what a monad is. A monad is a functor, which just means it has a map
method:
1 2 3 

This has to satisfy the law that map(x)(a => a) == x
, i.e. that mapping the identity function over our functor is a noop.
A monad is a functor M
equipped with two additional polymorphic functions; One from A
to M[A]
and one from M[M[A]]
to M[A]
.
1 2 3 4 

Recall that join
has to satisfy associativity, and unit
has to be an identity for join
.
In Scala a monad is often stated in terms of flatMap
, which is map
followed by join
. But I find this formulation easier to explain.
Every monad has the above operations, the socalled proper morphisms of a monad, and may also bring to the table some nonproper morphisms which give the specific monad some additional capabilities.
For example, the Reader
monad brings the ability to ask for a value:
1 2 3 

The meaning of join
in the reader monad is to pass the same context of type R
to both the outer scope and the inner scope:
1 2 

The Writer
monad has the ability to write a value on the side:
1 2 3 

The meaning of join
in the writer monad is to concatenate the “log” of written values using the monoid for W
(this is using the Monoid
class from Scalaz):
1 2 

And the meaning of unit
is to write the “empty” log:
1


The State
monad can both get and set the state:
1 2 

The meaning of join
in the state monad is to give the outer action an opportunity to get and put the state, then do the same for the inner action, making sure any subsequent actions see the changes made by previous ones.
1 2 3 4 5 6 7 

The Option
monad can terminate without an answer:
1


That’s enough examples of monads. Let’s now turn to comonads.
A comonad is the same thing as a monad, only backwards:
1 2 3 4 

Note that counit is pronounced “counit”, not “cowknit”. It’s also sometimes called extract
because it allows you to get a value of type A
out of a W[A]
. While with monads you can generally only put values in and not get them out, with comonads you can generally only get them out and not put them in.
And instead of being able to join
two levels of a monad into one, we can duplicate
one level of a comonad into two.
Kind of weird, right? This also has to obey some laws. We’ll get to those later on, but let’s first look at some actual comonads.
A simple and obvious comonad is the dumb wrapper (the identity comonad):
1 2 3 4 5 

This one is also the identity monad. Id
doesn’t have any functionality other than the proper morphisms of the (co)monad and is therefore not terribly interesting. We can get the value out with our counit
, and we can vacuously duplicate
by decorating our existing Id
with another layer.
There’s a comonad with the same capabilities as the reader monad, namely that it allows us to ask for a value:
1 2 3 4 5 

It should be obvious how we can give a Comonad
instance for this (I’m using the Kind Projector compiler plugin to make the syntax look a little nicer than Vanilla Scala):
1 2 3 4 5 6 

Arguably, this is much more straightforward in Scala than the reader monad. In the reader monad, the ask
function is the identity function. That’s saying “once the R
value is available, return it to me”, making it available to subsequent map
and flatMap
operations. But in Coreader
, we don’t have to pretend to have an R
value. It’s just right there and we can look at it.
So Coreader
just wraps up some value of type A
together with some additional context of type R
. Why is it important that this is a comonad? What is the meaning of duplicate
here?
To see the meaning of duplicate
, notice that it puts the whole Coreader
in the value slot (in the extract
portion). So any subsequent extract
or map
operation will be able to observe both the value of type A
and the context of type R
. We can think of this as passing the context along to those subsequent operations, which is analogous to what the reader monad does.
In fact, just like map
followed by join
is usually expressed as flatMap
, by the same token duplicate
followed by map
is usually expressed as a single operation, extend
:
1 2 3 4 5 

Notice that the type signature of extend
looks like flatMap
with the direction of f
reversed. And just like we can chain operations in a monad using flatMap
, we can chain operations in a comonad using extend
. In Coreader
, extend
is making sure that f
can use the context of type R
to produce its B
.
Chaining operations this way using flatMap
in a monad is sometimes called Kleisli composition, and chaining operations using extend
in a comonad is called coKleisli composition (or just Kleisli composition in a comonad).
The name extend
refers to the fact that it takes a “local” computation that operates on some structure and “extends” that to a “global” computation that operates on all substructures of the larger structure.
Just like the writer monad, the writer comonad can append to a log or running tally using a monoid. But instead of keeping the log always available to be appended to, it uses the same trick as the reader monad by building up an operation that gets executed once a log becomes available:
1 2 3 4 5 6 7 8 

Note that duplicate
returns a whole Cowriter
from its constructed run
function, so the meaning is that subsequent operations (composed via map
or extend
) have access to exactly one tell
function, which appends to the existing log or tally. For example, foo.extend(_.tell("hi"))
will append "hi"
to the log of foo
.
The comonad laws are analogous to the monad laws:
wa.duplicate.extract == wa
wa.extend(extract) == wa
wa.duplicate.duplicate == wa.extend(duplicate)
It can be hard to get an intuition for what these laws mean, but in short they mean that (co)Kleisli composition in a comonad should be associative and that extract
(a.k.a. counit
) should be an identity for it.
Very informally, both the monad and comonad laws mean that we should be able to compose our programs topdown or bottomup, or any combination thereof, and have that mean the same thing regardless.
In part 2 we’ll look at some more examples of comonads and follow some of the deeper connections. Like what’s the relationship between the reader monad and the reader comonad, or the writer monad and the writer comonad? They’re not identical, but they seem to do all the same things. Are they equivalent? Isomorphic? Something else?
]]>scala.concurrent.Future
in my code, I can get some really easy performance gains by just switching to scalaz.concurrent.Task
instead, particularly if I’m chaining them with map
or flatMap
calls, or with for
comprehensions.
Every Future
is basically some work that needs to be submitted to a thread pool. When you call futureA.flatMap(a => futureB)
, both Future[A]
and Future[B]
need to be submitted to the thread pool, even though they are not running concurrently and could theoretically run on the same thread. This context switching takes a bit of time.
With scalaz.concurrent.Task
you have a bit more control over when you submit work to a thread pool and when you actually want to continue on the thread that is already executing a Task
. When you say taskA.flatMap(a => taskB)
, the taskB
will by default just continue running on the same thread that was already executing taskA
. If you explicitly want to dip into the thread pool, you have to say so with Task.fork
.
This works since a Task
is not a concurrently running computation. It’s a description of a computation—a sequential list of instructions that may include instructions to submit some of the work to thread pools. The work is actually executed by a tight loop in Task
’s run
method. This loop is called a trampoline since every step in the Task
(that is, every subtask) returns control to this loop.
Jumping on a trampoline is a lot faster than jumping into a thread pool, so whenever we’re composing Future
s with map
and flatMap
, we can just switch to Task
and make our code faster.
But sometimes we know that we want to continue on the same thread and we don’t want to spend the time jumping on a trampoline at every step. To demonstrate this, I’ll use the Ackermann function. This is not necessarily a good use case for Future
but it shows the difference well.
1 2 3 4 5 6 

This function is supposed to terminate for all positive m
and n
, but if they are modestly large, this recursive definition overflows the stack. We could use futures to alleviate this, jumping into a thread pool instead of making a stack frame at each step:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Since there’s no actual concurrency going on here, we can make this instantly faster by switching to Task
instead, using a trampoline instead of a thread pool:
1 2 3 4 5 6 7 8 9 10 11 

But even here, we’re making too many jumps back to the trampoline with suspend
. We don’t actually need to suspend and return control to the trampoline at each step. We only need to do it enough times to avoid overflowing the stack. Let’s say we know how large our stack can grow:
1


We can then keep track of how many recursive calls we’ve made, and jump on the trampoline only when we need to:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

I did some comparisons using Caliper and made this pretty graph for you:
The horizontal axis is the number of steps, and the vertical axis is the mean time that number of steps took over a few thousand runs.
This graph shows that Task
is slightly faster than Future
for submitting to thread pools (blue and yellow lines marked Future and Task respectively) only for very small tasks; up to about when you get to 50 steps, when (on my Macbook) both futures and tasks cross the 30 μs threshold. This difference is probably due to the fact that a Future
is a running computation while a Task
is partially constructed up front and explicitly run
later. So with the Future
the threads might just be waiting for more work. The overhead of Task.run
seems to catch up with us at around 50 steps.
But honestly the difference between these two lines is not something I would care about in a real application, because if we jump on the trampoline instead of submitting to a thread pool (green line marked Trampoline), things are between one and two orders of magnitude faster.
If we only jump on the trampoline when we really need it (red line marked Optimized), we can gain another order of magnitude. Compared to the original naïve version that always goes to the thread pool, this is now the difference between running your program on a 10 MHz machine and running it on a 1 GHz machine.
If we measure without using any Task
/Future
at all, the line tracks the Optimized red line pretty closely then shoots to infinity around 1000 (or however many frames fit in your stack space) because the program crashes at that point.
In summary, if we’re smart about trampolines vs thread pools, Future
vs Task
, and optimize for our stack size, we can go from milliseconds to microseconds with not very much effort. Or seconds to milliseconds, or weeks to hours, as the case may be.
More depressingly, the thought of spending a year or more writing another book makes me anxious. I know from experience that making a book (at least a good one) is really hard and takes up a lot of mental energy. Maybe one day there will be a book that I will want to forego a year of evenings and weekends for, but today is not that day.
Originally, the content of FPiJ was going to be based on “Functional Programming in Scala”, but after some discussion with the publisher I think we were all beginning to see that this book deserved its own original content specifically on an FP style in Java.
I really do think such a thing deserves its own original book. Since Java is strictly less suitable for functional programming than Scala is, a book on FP in Java will have to lay a lot of groundwork that we didn’t have to do with FPiS, and it will have to forego a lot of the more advanced topics.
I wish the author of that book, and the publisher, all the best and I hope they do well. I’m sorry to let you all down, but I’m sure this is for the best.
]]>Naturally, readers get the most out of this book by downloading the source code from GitHub and doing the exercises as they read. But a number of readers have made the comment that they wish they could have the hints and answers with them when they read the book on the train to and from work, on a long flight, or wherever there is no internet connection or it’s not convenient to use a computer.
It is of course entirely possible to print out the chapter notes, hints, and exercises, and take them with you either as a hardcopy or as a PDF to use on a phone or tablet. Well, I’ve taken the liberty of doing that work for you. I wrote a little script to concatenate all the chapter notes, errata, hints, and answers into Markdown files and then just printed them all to a single document, tweaking a few things here and there. I’m calling this A companion booklet to “Functional Programming in Scala”. It is released under the same MIT license as the content it aggregates. This means you’re free to copy it, distribute or sell it, or basically do whatever you want with it. The Markdown source of the manuscript is available on my GitHub.
I have made an electronic version of this booklet available on Leanpub as as a PDF, ePub, and Kindle file on a paywhatyouwant basis (minimum of $0.99). It has full color syntax highlighting throughout and a few little tweaks to make it format nicely. The paper size is standard US Letter which makes it easy to print on most color printers. If you choose to buy the booklet from Leanpub, they get a small fee, a small portion of the proceeds goes to support Liberty in North Korea, and the rest goes to yours truly. You’ll also get updates when those inevitably happen.
If you don’t care about any of that, you can grab the PDF from here with my compliments.
The booklet is also available from CreateSpace or Amazon as a full color printed paperback. This comes in a nicely bound glossy cover for just a little more than the price of printing (they print it on demand for you). I’ve ordered one and I’m really happy with the quality of this print:
The print version is of course under the same permissive license, so you can make copies of it, make derivative works, or do whatever you want. It’s important to note that with this booklet I’ve not done anything other than design a little cover and then literally print out this freely available content and upload it to Amazon, which anybody could have done (and you still can if you want).
I hope this makes Functional Programming in Scala more useful and more enjoyable for more people.
]]>In the olden days of brickandmortar bookstores and libraries, I would discover books to read by browsing shelves and picking up what looked interesting at the time. I might even find something that I knew was on my list. “Oh, I’ve been meaning to read that!”
The Internet changes this dynamic dramatically. It makes it much easier for me to discover books that interest me, and also to access any book that I might want to read, instantly, anywhere. At any given time, I have a couple of books that I’m “currently reading”, and when I finish one I can start another immediately. I use Goodreads to manage my toread list, and it’s easy for me to scroll through the list and pick out my next book.
But again, this list is very long. So I wanted a good way to filter out books I will really never read, and sort it such that the most “important” books in some sense show up first. Then every time I need a new book I could take the first one from the list and make a binary decision: either “I will read this right now”, or “I am never reading this”. In the latter case, if a book interests me enough at a later time, I’m sure it will find its way back onto my list.
The problem then is to find a good metric by which to rank books. Goodreads lets users rank books with a starrating from 1 to 5, and presents you with an average rating by which you can sort the list. The problem is that a lot of books that interest me have only one rating and it’s 5 stars, giving the book an “average” of 5.0. So if I go with that method I will be perpetually reading obscure books that one other person has read and loved. This is not necessarily a bad thing, but I do want to branch out a bit.
Another possibility is to use the number of ratings to calculate a confidence interval for the average rating. For example, using the Wilson score I could find an upper and lower bound s1
and s2
(higher and lower than the average rating, respectively) that will let me say “I am 95% sure that any random sample of readers of an equal size would give an average rating between s1
and s2
.” I could then sort the list by the lower bound s1
.
But this method is dissatisfactory for a number of reasons. First, it’s not clear how to fit star ratings to such a measure. If we do the naive thing and count a 1star rating as 1/5 and a 5 star rating as 5/5, that counts a 1star rating as a “partial success” in some sense. We could discard 1stars as 0, and count 2, 3, 4, and 5 stars as 25%, 50%, 75%, and 100%, respectively.
But even if we did make it fit somehow, it turns out that if you take any moderately popular book on Goodreads at random, it will have an average rating somewhere close to 4. I could manufacture a prior based on this knowledge and use that instead of the normal distribution or the Jeffreys prior in the confidence interval, but that would still not be a very good ranking because reader review metascores are meaningless.
In the article “Reader review metascores are meaningless”, Stephanie Shun suggests using the percentage of 5star ratings as the relevant metric rather than the average rating. This is a good suggestion, since even a single 5star rating carries a lot of actionable information whereas an average rating close to 4.0 carries very little.
I can then use the Wilson score directly, counting a 5star rating as a successful trial and any other rating as a failed one. I can then just use the normal distribution instead of working with an artisanally curated prior.
Mathematica makes it easy to generate the Wilson score. Here, pos
is the number of positive trials (number of 5star ratings), n
is the number of total ratings, and confidence
is the desired confidence percentage. I’m taking the lower bound of the confidence interval to get my score.
1 2 3 4 5 6 7 8 9 10 

Now I just need to get the book data from Goodreads. Fortunately, it has a pretty rich API. I just need a developer key, which anyone can get for free.
For example, to get the ratings for a given book id
, we can use their XML api for books and pattern match on the result to get the ratings by score:
1 2 3 4 5 6 7 8 9 10 

Here, key
is my Goodreads developer API key, defined elsewhere. I put a Pause[1]
in the call since Goodreads throttles API calls so you can’t make more than one call per second to each API endpoint. I’m also memoizing the result, by assigning to Ratings[id]
in the global environment.
Ratings
will give us an association list with the number of ratings for each score from 1 to 5, together with the total. For example, for the first book in their catalogue, Harry Potter and the HalfBlood Prince, here are the scores:
1 2 3 4 5 6 7 8 

Sweet. Let’s see how Harry Potter #6 would score with our rating:
1 2 3 

So Wilson is 95% confident that in any random sample of about 1.2 million Harry Potter readers, at least 61.572% of them would give The HalfBlood Prince a 5star rating. That turns out to be a pretty high score, so if this book were on my list (which it isn’t), it would feature pretty close to the very top.
But now the score for a relatively obscure title is too low. For example, the lower bound of the 95% confidence interval for a singlerating 5star book will be 0.206549, which will be towards the bottom of any list. This means I would never get to any of the obscure books on my reading list, since they would be edged out by moderately popular books with an average rating close to 4.0.
See, if I’ve picked a book that I want to read, I’d consider five ratings that are all five stars a much stronger signal than the fact that people who like Harry Potter enough to read 5 previous books loved the 6th one. Currently the 5*5 book will score 57%, a bit weaker than the Potter book’s 62%.
I can fix this by lowering the confidence level. Because honestly, I don’t need a high confidence in the ranking. I’d rather err on the side of picking up a deservedly obscure book than to miss out on a rare gem. Experimenting with this a bit, I find that a confidence around 80% raises the obscure books enough to give me an interesting mix. For example, a 5*5 book gets a 75% rank, while the Harry Potter one stays at 62%.
I’m going to call that the Rúnar rank of a given book. The Rúnar rank is defined as the lower bound of the 11/q Wilson confidence interval for scoring in the qth qquantile. In the special case of Goodreads ratings, it’s the 80% confidence for a 5star rating.
1 2 

Unfortunately, there’s no way to get the rank of all the books in my reading list in one call to the Goodreads API. And when I asked them about it they basically said “you can’t do that”, so I’m assuming that feature will not be added any time soon. So I’ll have to get the reading list first, then call RunarRank
for each book’s id
. In Goodreads, books are managed by “shelves”, and the API allows getting the contents of a given shelf, 200 books at a time:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

I’m doing a bunch of XML pattern matching here to get the id
, title
, average_rating
, and first author
of each book. Then I put that in an association list. I’m getting only the top200 books on the list by average rating (which currently is about half my list).
With that in hand, I can get the contents of my “toread” shelf with GetShelf[runar, "toread"]
, where runar
is my Goodreads user id. And given that, I can call RunarRank
on each book on the shelf, then sort the result by that rank:
1 2 3 

To get the ranked reading list of any user:
1


And to print it out nicely:
1 2 3 4 5 6 7 

Now I can get, say, the first 10 books on my improved reading list:
1


9934419  Kvæðasafn  75.2743% 
Snorri Hjartarson  5.00  
17278  The Feynman Lectures on Physics Vol 1  67.2231% 
Richard P. Feynman  4.58  
640909  The Knowing Animal: A Philosophical Inquiry Into Knowledge and Truth  64.6221% 
Raymond Tallis  5.00  
640913  The Hand: A Philosophical Inquiry Into Human Being  64.6221% 
Raymond Tallis  5.00  
4050770  Volition As Cognitive Self Regulation  62.231% 
Harry Binswanger  4.86  
8664353  Unbroken: A World War II Story of Survival, Resilience, and Redemption  60.9849% 
Laura Hillenbrand  4.45  
13413455  Software Foundations  60.1596% 
Benjamin C. Pierce  4.80  
77523  Harry Potter and the Sorcerer’s Stone (Harry Potter #1)  59.1459% 
J.K. Rowling  4.39  
13539024  Free Market Revolution: How Ayn Rand’s Ideas Can End Big Government  59.1102% 
Yaron Brook  4.48  
1609224  The Law  58.767% 
Frédéric Bastiat  4.40  
I’m quite happy with that. Some very popular and wellloved books interspersed with obscure ones with exclusively (or almost exclusively) positive reviews. The most satisfying thing is that the rating carries a real meaning. It’s basically the relative likelihood that I will enjoy the book enough to rate it five stars.
I can test this ranking against books I’ve already read. Here’s the top of my “read” shelf, according to their Rúnar Rank:
17930467  The Fourth Phase of Water  68.0406% 
Gerald H. Pollack  4.85  
7687279  Nothing Less Than Victory: Decisive Wars and the Lessons of History  64.9297% 
John David Lewis  4.67  
43713  Structure and Interpretation of Computer Programs  62.0211% 
Harold Abelson  4.47  
7543507  Capitalism Unbound: The Incontestable Moral Case for Individual Rights  57.6085% 
Andrew Bernstein  4.67  
13542387  The DIM Hypothesis: Why the Lights of the West Are Going Out  55.3296% 
Leonard Peikoff  4.37  
5932  Twenty Love Poems and a Song of Despair  54.7205% 
Pablo Neruda  4.36  
18007564  The Martian  53.9136% 
Andy Weir  4.36  
24113  Gödel, Escher, Bach: An Eternal Golden Braid  53.5588% 
Douglas R. Hofstadter  4.29  
19312  The Brothers Lionheart  53.0952% 
Astrid Lindgren  4.33  
13541678  Functional Programming in Scala  52.6902% 
Rúnar Bjarnason  4.54  
That’s perfect. Those are definitely books I thouroughly enjoyed and would heartily recommend. Especially that last one.
I’ve published this function as a Wolfram Cloud API, and you can call it at https://www.wolframcloud.com/app/objects/4f4a7b3c38a54bf381b67ca8e05ea100. It takes two URL query parameters, key
and user
, which are your Goodreads API key and the Goodreads user ID whose reading list you want to generate, respectively. Enjoy!
This principle is very widely applicable, and it’s a useful thing to keep in mind when designing languages and libraries. A practical implication of being aware of this principle is that we always make components exactly as expressive as necessary, but no more. This maximizes the ability of any downstream systems to reason about our components. And dually, for things that we receive or consume, we should require exactly as much analytic power as necessary, and no more. That maximizes the expressive freedom of the upstream components.
I find myself thinking about this principle a lot lately, and seeing it more or less everywhere I look. So I’m seeking a more general statement of it, if such a thing is possible. It seems that more generally than issues of expressivity/analyzability, a restriction at one semantic level translates to freedom and power at another semantic level.
What I want to do here is give a whole bunch of examples. Then we’ll see if we can come up with an integration for them all. This is all written as an exercise in thinking out loud and is not to be taken very seriously.
In formal language theory, contextfree grammars are more expressive than regular grammars. The former can describe strictly more sets of strings than the latter. On the other hand, it’s harder to reason about contextfree grammars than regular ones. For example, we can decide whether two regular expressions are equal (they describe the same set of strings), but this is undecidable in general for contextfree grammars.
If we know that an applicative functor is a monad, we gain some expressive power that we don’t get with just an applicative functor. Namely, a monad is an applicative functor with an additional capability: monadic join (or “bind”, or “flatMap”). That is, contextsensitivity, or the ability to bind variables in monadic expressions.
This power comes at a cost. Whereas we can always compose any two applicatives to form a composite applicative, two monads do not in general compose to form a monad. It may be the case that a given monad composes with any other monad, but we need some additional information about it in order to be able to conclude that it does.
Futures have an algebraic theory, so we can reason about them algebraically. Namely, they form an applicative functor which means that two futures x
and y
make a composite future that does x
and y
in parallel. They also compose sequentially since they form a monad.
Actors on the other hand have no algebraic theory and afford no algebraic reasoning of this sort. They are “fire and forget”, so they could potentially do anything at all. This means that actor systems can do strictly more things in more ways than systems composed of futures, but our ability to reason about such systems is drastically diminished.
When we have an untyped function, it could receive any type of argument and produce any type of output. The implementation is totally unrestricted, so that gives us a great deal of expressive freedom. Such a function can potentially participate in a lot of different expressions that use the function in different ways.
A function of type Bool > Bool
however is highly restricted. Its argument can only be one of two things, and the result can only be one of two things as well. So there are 4 different implementations such a function could possibly have. Therefore this restriction gives us a great deal of analyzability.
For example, since the argument is of type Bool
and not Any
, the implementation mostly writes itself. We need to consider only two possibilities. Bool
(a type of size 2) is fundamentally easier to reason about than Any
(a type of potentially infinite size). Similarly, any usage of the function is easy to reason about. A caller can be sure not to call it with arguments other than True
or False
, and enlist the help of a type system to guarantee that expressions involving the function are meaningful.
Programming in nontotal languages affords us the power of general recursion and “fast and loose reasoning” where we can transition between valid states through potentially invalid ones. The cost is, of course, the halting problem. But more than that, we can no longer be certain that our programs are meaningful, and we lose some algebraic reasoning. For example, consider the following:
1


This states that adding n
to every number in a list and then subtracting n
again should be the identity. But what if n
actually throws an exception or never halts? In a nontotal language, we need some additional information. Namely, we need to know that n
is total.
The example above also serves to illustrate the tradeoff between purely functional and impure programming. If n
could have arbitrary side effects, algebraic reasoning of this sort involving n
is totally annihilated. But if we know that n
is referentially transparent, algebraic reasoning is preserved. The power of side effects comes at the cost of algebraic reasoning. This price includes loss of compositionality, modularity, parallelizability, and parametricity. Our programs can do strictly more things, but we can conclude strictly fewer things about our programs.
There is a principle in computer security called The Principle of Least Privilege. It says that a user or program should have exactly as much authority as necessary but no more. This constrains the power of the entity, but greatly enhances the power of others to predict and reason about what the entity is going to do, resulting in the following benefits:
Some might notice an analogy between the Principle of Least Privilege and the idea of a constitutionally limited government. An absolute dictatorship or pure democracy will have absolute power to enact whatever whim strikes the ruler or majority at the moment. But the overall stability, security, and freedom of the people is greatly enhanced by the presence of legal limits on the power of the government. A limited constitutional republic also makes for a better neighbor to other states.
More generally, a ban on the initiation of physical force by one citizen against another, or by the government against citizens, or against other states, makes for a peaceful and prosperous society. The “cost” of such a system is the inability of one person (or even a great number of people) to impose their preferences on others by force.
The framework of twodimensional Euclidean geometry is simply an empty page on which we can construct lines and curves using tools like a compass and straightedge. When we go from that framework to a Cartesian one, we constrain ourselves to reasoning on a grid of pairs of numbers. This is a tradeoff between expressivity and analyzability. When we move fom Euclidean to Cartesian geometry, we lose the ability to assume isotropy of space, intersection of curves, and compatibility between dimensions. But we gain much more powerful things through the restriction: the ability to precisely define geometric objects, to do arithmetic with them, to generalize to higher dimensions, and to reason with higher abstractions like linear algebra and category theory.
Roads constrain the routes we can take when we drive or walk. We give up moving in a straight line to wherever we want to go. But the benefit is huge. Roads let us get to where we’re going much faster and more safely than we would otherwise.
Let’s say you make a decision to have only one kind of outfit that you wear on a daily basis. You just go out and buy multiple identical outfits. Whereas you have lost the ability to express yourself by the things you wear, you have gained a certain ability to reason about your clothing. The system is also faulttolerant and compositional!
What is this principle? Here are some ways of saying it:
What do you think? Can you think of a way to integrate these examples into a general principle? Do you have other favorite examples of this principle in action? Is this something everyone already knows about and I’m just late to the party?
]]>Hopefully this means that I will post more here when I have something interesting to share. If you need to reach me, I’m available by email. My address is runar
at this blog’s domain.
(UPDATE 20141221): I’ve recreated my Twitter account, but I’m now using the service in a different way. The biggest change is that I don’t follow anyone. It’s strictly a broadcasting device for me, and not an information consumption device.
]]>I just want to share my personal story of how this book came to exist. A much shorter version of this story became the preface for the finished book, but here is the long version.
Around 2006 I was working in Austin and coming up on my 8th anniversary as an enterprise Java programmer. I had started to notice that I was making a lot of the same mistakes over and over again. I had a copy of the Gang of Four’s Design Patterns on my desk that I referred to frequently, and I built what I thought were elegant objectoriented designs. Every new project started out well enough, but became a big ball of mud after a while. My onceelegant class hierarchies gathered bugs, technical debt, and unimplemented features. Making changes started to feel like trudging through a swamp. I was never confident that I wasn’t introducing defects as I went. My code was difficult to test or reuse, and impossible to reason about. My productivity plummeted, and a complete rewrite became inevitable. It was a vicious cycle.
In looking for a more disciplined approach, I came across Haskell and functional programming. Here was a community of people with a sound methodology for reasoning about their programs. In other words, they actually knew what they were doing. I found a lot of good ideas and proceeded to import them to Java. A little later I met Tony Morris, who had been doing the same, on IRC. He told me about this new JVM language, Scala. Tony had a library called Scalaz (scalazed) that made FP in Scala more pleasant, and I started contributing to that library. One of the other people contributing to Scalaz was Paul Chiusano, who was working for a company in Boston. In 2008 he invited me to come work with him, doing Scala full time. I sold my house and everything in it, and moved to Boston.
Paul coorganized the Boston Area Scala Enthusiasts, a group that met monthly at Google’s office in Cambridge. It was a popular group, mainly among Java programmers who were looking for something better. But there was a clear gap between those who had come to Scala from an FP perspective and those who saw Scala as just a better way to write Java. In April 2010 another of the coorganizers, Nermin Serifovic, said he thought there was “tremendous demand” for a book that would bridge that gap, on the topic of functional programming in Scala. He suggested that Paul and I write that book. We had a very clear idea of the kind of book we wanted to write, and we thought it would be quick and easy. More than four years later, I think we have made a good book.
Paul and I hope to convey in this book some of the excitement that we felt when we were first discovering FP. It’s encouraging and empowering to finally feel like we’re writing comprehensible software that we can reuse and confidently build upon. We want to invite you to the world of programming as it could be and ought to be.
– Rúnar Óli Bjarnason
Boston, August 2014
]]>This post kind of fell together when writing notes on chapter 10, “Monoids”, of Functional Programming in Scala. I am putting it here so I can reference it from the chapter notes at the end of the book.
Let’s take the String
concatenation and Int
addition as example monoids that have a relationship. Note that if we take the length of two strings and add them up, this is the same as concatenating those two strings and taking the length of the combined string:
1


So every String
maps to a corresponding Int
(its length), and every concatenation of strings maps to the addition of corresponding lengths.
The length
function maps from String
to Int
while preserving the monoid structure. Such a function, that maps from one monoid to another in such a preserving way, is called a monoid homomorphism. In general, for monoids M
and N
, a homomorphism f: M => N
, and all values x:M
, y:M
, the following equations hold:
1 2 3 

The +
syntax is from Scalaz and is obtained by importing scalaz.syntax.monoid._
. It just references the append
method on the Monoid[T]
instance, where T
is the type of the arguments. The mzero[T]
function is also from that same import and references zero
in Monoid[T]
.
This homomorphism law can have real practical benefits. Imagine for example a “result set” monoid that tracks the locations of a particular set of records in a database or file. This could be as simple as a Set
of locations. Concatenating several thousand files and then proceeding to search through them is going to be much slower than searching through the files individually and then concatenating the result sets. Particularly since we can potentially search the files in parallel. A good automated test for our result set monoid would be that it admits a homomorphism from the data file monoid.
Sometimes there will be a homomorphism in both directions between two monoids. If these are inverses of one another, then this kind of relationship is called a monoid isomorphism and we say that the two monoids are isomorphic. More precisely, we will have two monoids A
and B
, and homomorphisms f: A => B
and g: B => A
. If f(g(b)) == b
and g(f(a)) == a
, for all a:A
and b:B
then f
and g
form an isomorphism.
For example, the String
and List[Char]
monoids with concatenation are isomorphic. We can convert a String
to a List[Char]
, preserving the monoid structure, and go back again to the exact same String
we started with. This is also true in the inverse direction, so the isomorphism holds.
Other examples include (Boolean
, &&
) and (Boolean
, 
) which are isomorphic via not
.
Note that there are monoids with homomorphisms in both directions between them that nevertheless are not isomorphic. For example, (Int
, *
) and (Int
, +
). These are homomorphic to one another, but not isomorphic (thanks, Robbie Gates).
If A
and B
are monoids, then (A,B)
is certainly a monoid, called their product:
1 2 3 4 5 6 

But is there such a thing as a monoid coproduct? Could we just use Either[A,B]
for monoids A
and B
? What would be the zero
of such a monoid? And what would be the value of Left(a) + Right(b)
? We could certainly choose an arbitrary rule, and we may even be able to satisfy the monoid laws, but would that mean we have a monoid coproduct?
To answer this, we need to know the precise meaning of product and coproduct. These come straight from Wikipedia, with a little help from Cale Gibbard.
A product M
of two monoids A
and B
is a monoid such that there exist homomorphisms fst: M => A
, snd: M => B
, and for any monoid Z
and morphisms f: Z => A
and g: Z => B
there has to be a unique homomorphism h: Z => M
such that fst(h(z)) == f(z)
and snd(h(z)) == g(z)
for all z:Z
. In other words, the following diagram must commute:
A coproduct W
of two monoids A
and B
is the same except the arrows are reversed. It’s a monoid such that there exist homomorphisms left: A => W
, right: B => W
, and for any monoid Z
and morphisms f: A => Z
and g: B => Z
there has to be a unique homomorphism h: W => Z
such that h(left(a)) == f(a)
and h(right(b)) == g(b)
for all a:A
and all b:B
. In other words, the following diagram must commute:
We can easily show that our productMonoid
above really is a monoid product. The homomorphisms are the methods _1
and _2
on Tuple2
. They simply map every element of (A,B)
to a corresponding element in A
and B
. The monoid structure is preserved because:
1 2 3 4 5 

And for any other monoid Z
, and morphisms f: Z => A
and g: Z => B
, we can construct a unique morphism from Z
to (A,B)
:
1 2 

And this really is a homomorphism because we just inherit the homomorphism law from f
and g
.
What does a coproduct then look like? Well, it’s going to be a type C[A,B]
together with an instance coproduct[A:Monoid,B:Monoid]:Monoid[C[A,B]]
. It will be equipped with two monoid homomorphisms, left: A => C[A,B]
and right: B => C[A,B]
that satisfy the following (according to the monoid homomorphism law):
1 2 3 4 5 

And additionally, for any other monoid Z
and homomorphisms f: A => Z
and g: B => Z
we must be able to construct a unique homomorphism from C[A,B]
to Z
:
1


Right off the bat, we know some things that definitely won’t work. Just using Either
is a nonstarter because there’s no welldefined zero
for it, and there’s no way of appending a Left
to a Right
. But what if we just added that structure?
The underlying set of a monoid A
is just the type A
without the monoid structure. The coproduct of types A
and B
is the type Either[A,B]
. Having “forgotten” the monoid structure of both A
and B
, we can recover it by generating a free monoid on Either[A,B]
, which is just List[Either[A,B]]
. The append
operation of this monoid is list concatenation, and the identity for it is the empty list.
Clearly List[Either[A,B]]
is a monoid, but does it permit a homomorphism from both monoids A
and B
? If so, then the following properties should hold:
1 2 

They clearly do not hold! The lists on the left of ==
will have two elements and the lists on the right will have one element. Can we do something about this?
Well, the fact is that List[Either[A,B]]
is not exactly the monoid coproduct of A
and B
. It’s still “too big”. The problem is that we can observe the internal structure of expressions.
What we need is not exactly the List
monoid, but a new monoid called the free monoid product:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 

Eithers[A,B]
is a kind of List[Either[A,B]]
that has been normalized so that consecutive A
s and consecutive B
s have been collapsed using their respective monoids. So it will contain alternating A
and B
values.
The only remaining problem is that a list full of identities is not exactly the same as the empty list. Remember the unit part of the homomorphism law:
1 2 

This doesn’t hold at the moment. As Cale Gibbard points out in the comments below, Eithers
is really the free monoid on the coproduct of semigroups A
and B
.
We could check each element as part of the normalization step to see if it equals(zero)
for the given monoid. But that’s a problem, as there are lots of monoids for which we can’t write an equals
method. For example, for the Int => Int
monoid (with composition), we must make use of a notion like extensional equality, which we can’t reasonably write in Scala.
So what we have to do is sort of wave our hands and say that equality on Eithers[A,B]
is defined as whatever notion of equality we have for A
and B
respectively, with the rule that es.fold[(A,B)]
defines the equality of Eithers[A,B]
. For example, for monoids that really can have Equal
instances:
1 2 3 4 5 6 7 8 9 10 11 

So we have to settle with a list full of zeroes being “morally equivalent” to an empty list. The difference is observable in e.g. the time it takes to traverse the list.
Setting that issue aside, Eithers
is a monoid coproduct because it permits monoid homomorphisms from A
and B
:
1 2 3 4 

And fold
really is a homomorphism, and we can prove it by case analysis. Here’s the law again:
1


If either of e1
or e2
is empty then the result is the fold of the other, so those cases are trivial. If they are both nonempty, then they will have one of these forms:
1 2 3 4 5 6 7 8 9 10 11 

In the first two cases, on the right of the ==
sign in the law, we perform a1 + a2
and b1 + b2
respectively before concatenating. In the other two cases we simply concatenate the lists. The ++
method on Eithers
takes care of doing this correctly for us. On the left of the ==
sign we fold the lists individually and they will be alternating applications of f
and g
. So then this law amounts to the fact that f(a1 + a2) == f(a1) + f(a2)
in the first case, and the same for g
in the second case. In the latter two cases this amounts to a homomorphism on List
. So as long as f
and g
are homomorphisms, so is _.fold(f,g)
. Therefore, Eithers[A,B]
is a coproduct of A
and B
.
IO
that is supposedly a “free monad”. But the monad I presented is not exactly the same as scalaz.Free
and some people have been asking me why there is a difference and what that difference means.
The Free
monad in Scalaz is given a bit like this:
1 2 3 

And throughout the methods on Free
, it is required that F
is a functor because in order to get at the recursive step inside a Suspend
, we need to map
over the F
somehow.
But the IO
monad I gave in the talk looks more like this:
1 2 3 

And it could actually be stated as an application of Free
:
1


So in a very superficial sense, this is how the IO
monad relates to Free
. The monad IO[F,_]
for a given F
is the free monad generated by the functor (F[I], I => _)
for some type I
. And do note that this is a functor no matter what F
is.
There is a deeper sense in which IO
and Free
are actually equivalent (more precisely, isomorphic). That is, there exists a transformation from one to the other and back again. Since the only difference between IO
and Free
is in the functors F[_]
vs ∃I. (F[I], I => _)
, we just have to show that these two are isomorphic for any F
.
There is an important result in category theory known as the Yoneda lemma. What it says is that if you have a function defined like this…
1


…then you certainly have a value of type F[A]
. All you need is to pass the identity function to map
in order to get the value of type F[A]
out of this function. In fact, a function like this is in practice probably defined as a method on a value of type F[A]
anyway. This also means that F
is definitely a functor.
The Yoneda lemma says that this goes the other way around as well. If you have a value of type F[A]
for any functor F
and any type A
, then you certainly have a map
function with the signature above.
In scala terms, we can capture this in a type:
1 2 3 

And the Yoneda lemma says that there is an isomorphism between Yoneda[F,A]
and F[A]
, for any functor F
and any type A
. Here is the proof:
1 2 3 4 5 6 7 

Of course, this also means that if we have a type B
, a function of type (B => A)
for some type A
, and a value of type F[B]
for some functor F
, then we certainly have a value of type F[A]
. This is kind of obvious, since we can just pass the B => A
and the F[B]
to the map
function for the functor and get our F[A]
.
But the opposite is also true, and that is the really interesting part. If we have a value of type F[A]
, for any F
and A
, then we can always destructure it into a value of type F[B]
and a function of type B => A
, at least for some type B
. And it turns out that we can do this even if F
is not a functor.
This is the permutation of the Yoneda lemma that we were using in IO
above. That is, IO[F, A]
is really Free[({type λ[α] = CoYoneda[F,α]})#λ, A]
, given:
1 2 3 4 5 

And the lemma says that CoYoneda[F,A]
is isomorphic to F[A]
. Here is the proof:
1 2 3 4 5 6 

Of course, this destructuring into CoYoneda
using the identity function is the simplest and most general, but there may be others for specific F
and A
depending on what we know about them.
So there you have it. The scalaz.Free
monad with its Suspend(F[Free[F,A]])
constructor and the IO
monad with its Req(F[I], I => IO[F,A])
constructor are actually equivalent. The latter is simply making use of CoYoneda
to say the same thing.
Why bother? The useful part is that CoYoneda[F,_]
is a functor for any F
, so it’s handy to use in a free monad since we can then drop the requirement that F
is a functor. What’s more, it gives us map fusion for free, since map
over CoYoneda
is literally just function composition on its f
component. Although this latter is, in the absence of tail call elimination, not as useful as it could be in Scala.
I hope that sheds a little bit of light on the Yoneda lemma as well as the different embeddings of free monads.
]]>List
is a free monoid. That is, for any given type A
, List[A]
is a monoid, with list concatenation as the operation and the empty list as the identity element.
1 2 3 4 

Being a free monoid means that it’s the minimal such structure. List[A]
has exactly enough structure so that it is a monoid for any given A
, and it has no further structure. This also means that for any given monoid B
, there must exist a transformation, a monoid homomorphism from List[A]
to B
:
1


Given a mapping from A
to a monoid B
, we can collapse a value in the monoid List[A]
to a value in B
.
Now, if you followed my old posts, you already know that monads are “higherkinded monoids”. A monoid in a category where the objects are type constructors (functors, actually) and the arrows between them are natural transformations. As a reminder, a natural transformation from F
to G
can be represented this way in Scala:
1 2 3 

And it turns out that there is a free monad for any given functor F
:
1 2 3 

Analogous to how a List[A]
is either Nil
(the empty list) or a product of a head
element and tail
list, a value of type Free[F,A]
is either an A
or a product of F[_]
and Free[F,_]
. It is a recursive structure. And indeed, it has exactly enough structure to be a monad, for any given F
, and no more.
When I say “product” of two functors like F[_]
and Free[F,_]
, I mean a product like this:
1 2 3 

So we might expect that there is a monad homomorphism from a free monad on F
to any monad that F
can be transformed to. And indeed, it turns out that there is. The free monad catamorphism is in fact a monad homomorphism. Given a natural transformation from F
to G
, we can collapse a Free[F,A]
to G[A]
, just like with foldMap
when given a function from A
to B
we could collapse a List[A]
to B
.
1


But what’s the equivalent of foldRight
for Free
? Remember, foldRight takes a unit element z
and a function that accumulates into B
so that B
doesn’t actually have to be a monoid. Here, f
is a lot like the monoid operation, except it takes the current A
on the left:
1


The equivalent for Free
takes a natural transformation as its unit element, which for a monad happens to be monadic unit
. Then it takes a natural transformation as its f
argument, that looks a lot like monadic join
, except it takes the current F
on the left:
1 2 3 4 5 6 

In this case, G
does not have to be a monad at all.
Free
as well as natural transformations and product types are available in Scalaz.
The preëmptive answer to the usual followup question is that the talk was not recorded.
]]>The typical definition of purity (and the one we use in our book) goes something like this:
An expression e
is referentially transparent if for all programs p
, every occurrence of e
in p
can be replaced with the result of evaluating e
without changing the result of evaluating p
.
A function f
is pure if the expression f(x)
is referentially transparent for all referentially transparent x
.
Now, something needs to be made clear right up front. Like all definitions, this holds in a specific context. In particular, the context needs to specify what “evaluating” means. It also needs to define “program”, “occurrence”, and the semantics of “replacing” one thing with another.
In a programming language like Haskell, Java, or Scala, this context is pretty well established. The process of evaluation is a reduction to some normal form such as weak head or beta normal form.
To illustrate, let’s consider programs in an exceedingly simple language that we will call Sigma. An expression in Sigma has one of the following forms:
"a"
, "foo"
, ""
, etc.s + t
, for expressions s
and t
.Ext
expression that denotes input from an external source.Now, without an evaluator for Sigma, it is a purely abstract algebra. So let’s define a straigtforward evaluator eval
for it, with the following rules:
eval(s + t)
first evaluates s
and t
and concatenates the results into one literal string.eval(Ext)
reads a line from standard input and returns it as a literal string.This might seem very simple, but it is still not clear whether Ext
is referentially transparent with regard to eval
. It depends. What does “reads a line” mean, and what is “standard input” exactly? This is all part of a context that needs to be established.
Here’s one implementation of an evaluator for Sigma, in Scala:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 

Now, it’s easy to see that the Ext
instruction is not referentially transparent with regard to eval1
. Replacing Ext
with eval1(ext)
does not preserve meaning. Consider this:
1 2 

VS this:
1 2 

That’s clearly not the same thing. The former will get two strings from standard input and concatenate them together. The latter will get only one string, store it as x
, and return x + x
.
Now consider a slightly different evaluator:
1 2 3 4 5 6 7 8 9 

In this case, the Ext
instruction clearly is referentially transparent with regard to eval2
, because our standard input is just a string, and it is always the same string. So you see, the purity of functions in the Sigma language very much depends on how that language is interpreted.
This is the reason why Haskell programs are considered “pure”, even in the presence of IO
. A value of type IO a
in Haskell is simply a function. Reducing it to normal form (evaluating it) has no effect. An IO
action is of course not referentially transparent with regard to unsafePerformIO
, but as long as your program does not use that it remains a referentially transparent expression.
In my experience there are more or less two camps into which unhelpful views on purity fall.
The first view, which we will call the empiricist view, is typically taken by people who understand “pure” as a pretentious term, meant to denegrate regular everyday programming as being somehow “impure” or “unclean”. They see purity as being “academic”, detached from reality, in an ivory tower, or the like.
This view is premised on a superficial understanding of purity. The assumption is that purity is somehow about the absence of I/O, or not mutating memory. But how could any programs be written that don’t change the state of memory? At the end of the day, you have to update the CPU’s registers, write to memory, and produce output on a display. A program has to make the computer do something, right? So aren’t we just pretending that our programs don’t run on real computers? Isn’t it all just an academic exercise in making the CPU warm?
Well, no. That’s not what purity means. Purity is not about the absence of program behaviors like I/O or mutable memory. It’s about delimiting such behavior in a specific way.
The other view, which I will call the rationalist view, is typically taken by people with overexposure to modern analytic philosophy. Expressions are to be understood by their denotation, not by reference to any evaluator. Then of course every expression is really referentially transparent, and so purity is a distinction without a difference. After all, an imperative sideeffectful C program can have the same denotation as a monadic, sideeffectfree Haskell program. There is nothing wrong with this viewpoint, but it’s not instructive in this context. Sure, when designing in the abstract, we can think denotationally without regard to evaluation. But when concretizing the design in terms of an actual programming language, we do need to be aware of how we expect evaluation to take place. And only then are referential transparency and purity useful concepts.
Both of these, the rationalist and empiricist views, conflate different levels of abstraction. A program written in a programming language is not the same thing as the physical machine that it may run on. Nor is it the same thing as the abstractions that capture its meaning.
I highly recommend the paper What is a Purely Functional Language? by Amr Sabry, although it deals with the idea of a purely functional language rather than purity of functions within a language that does not meet that criteria.
]]>So I have decided to make posting really easy for myself by hosting the blog on GitHub. I am using a deadsimple markdownbased framework called Octopress. With this setup I can very easily write a new post from my command line and publish by pushing to GitHub. This is already part of my normal coding workflow, so it feels more frictionfree.
The new blog is simply titled “Higher Order”, and is available at blog.higherorder.com. Check back soon for posts that I’ve been sitting on but have been too lazy busy to post.
All of the old content and comments will still be available at the old address, and I’ll probably crosspost to both places for a little while.
]]>A lot of people have asked me to write a tutorial on how this works, specifically on how it is implemented in Scalaz and how to be productive with it, so here we go.
The implementation in Scalaz is based on an excellent article by John W. Lato called “Iteratee: Teaching an Old Fold New Tricks”. As a consequence, this post is also based on that article, and because I am too unoriginal to come up with my own examples, the examples are directly translated from it. The article gives code examples in Haskell, but we will use Scala here throughout.
Most programmers have come across the problem of treating an I/O data source (such as a file or a socket) as a data structure. This is a common thing to want to do. To contrast, the usual means of reading, say, a file, is to open it, get a cursor into the file (such as a FileReader or an InputStream), and read the contents of the file as it is being processed. You must of course handle IO exceptions and remember to close the file when you are done. The problem with this approach is that it is not modular. Functions written in this way are performing oneoff sideeffects. And as we know, sideeffects do not compose.
Treating the stream of inputs as an enumeration is therefore desirable. It at least holds the lure of modularity, since we would be able to treat a File, from which we’re reading values, in the same way that we would treat an ordinary List of values, for example.
A naive approach to this is to use iterators, or rather, Iterables. This is akin to the way that you would typically read a file in something like Ruby or Python. Basically you treat it as a collection of Strings:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 

What this is doing is a kind of lazy I/O. Nothing is read from the file until it is requested, and we only hold one line in memory at a time. But there are some serious issues with this approach. It’s not clear when you should close the file handle, or whose responsibility that is. You could have the Iterator close the file when it has read the last line, but what if you only want to read part of the file? Clearly this approach is not sufficient. There are some things we can do to make this more sophisticated, but only at the expense of breaking the illusion that the file really is a collection of Strings.
Any functional programmer worth their salt should be thinking right about now: “Instead of getting Strings out of the file, just pass in a function that will serve as a handler for each new line!” Bingo. This is in fact the plot with Iteratees. Instead of implementing an interface from which we request Strings by pulling, we’re going to give an implementation of an interface that can receive Strings by pushing.
And indeed, this idea is nothing new. This is exactly what we do when we fold a list:
1


The second argument is exactly that, a handler for each element in the list, along with a means of combining it with the accumulated value so far.
Now, there are two issues with an ordinary fold that prevent it from being useful when enumerating file contents. Firstly, there is no way of indicating that the fold should stop early. Secondly, a list is held all in memory at the same time.
Scalaz defines the following two data structures (actual implementation may differ, but this serves for illustration):
1 2 3 4 5 6 7 8 9 10 

So an input to an iteratee is represented by Input[E], where E is the element type of the input source. It can be either an element (the next element in the file or stream), or it’s one of two signals: Empty or EOF. The Empty signal tells the iteratee that there is not an element available, but to expect more elements later. The EOF signal tells the iteratee that there are no more elements to be had.
Note that this particular set of signals is kind of arbitrary. It just facilitates a particular set of use cases. There’s no reason you couldn’t have other signals for other use cases. For example, a signal I can think of off the top of my head would be Restart, which would tell the iteratee to start its result from scratch at the current position in the input.
IterV[E,A] represents a computation that can be in one of two states. It can be Done, in which case it will hold a result (the accumulated value) of type A. Or it can be waiting for more input of type E, in which case it will hold a continuation that accepts the next input.
Let’s see how we would use this to process a List. The following function takes a list and an iteratee and feeds the list’s elements to the iteratee.
1 2 3 4 5 

Now let’s see some actual iteratees. As a simple example, here is an iteratee that counts the number of elements it has seen:
1 2 3 4 5 6 7 8 

And here’s an iteratee that discards the first n elements:
1 2 3 4 5 6 7 8 

And one that takes the first element from the input:
1 2 3 4 5 6 7 8 

Let’s go through this code. Each one defines a “step” function, which is the function that will handle the next input. Each one starts the iteratee in the Cont state, and the step function always returns a new iteratee in the next state based on the input received. Note in the last one (head), we are using the Empty signal to indicate that we want to remove the element from the input. The utility of this will be clear when we start composing iteratees.
Now, an example usage. To get the length of a list, we write:
1


The run method on IterV just gets the accumulated value out of the Done iteratee. If it isn’t done, it sends the EOF signal to itself first and then gets the value.
Notice a couple of things here. With iteratees, the input source can send the signal that it has finished producing values. And on the other side, the iteratee itself can signal to the input source that it has finished consuming values. So on one hand, we can leave an iteratee “running” by not sending it the EOF signal, so we can compose two input sources and feed them into the same iteratee. On the other hand, an iteratee can signal that it’s done, at which point we can start sending any remaining elements to another iteratee. In other words, iteratees compose sequentially.
In fact, IterV[E,A] is an instance of the Monad type class for each fixed E, and composition is very similar to the way monadic parsers compose:
1 2 3 4 5 6 7 

Here then is an example of composing iteratees with a forcomprehension:
1 2 3 4 

The iteratee above discards the first element it sees and returns the second one. The iteratee below does this n times, accumulating the kept elements into a list.
1 2 3 4 5 6 7 

Here’s an example run:
1 2 

Using the iteratees to read from file input turns out to be incredibly easy. The only difference is in how the data source is enumerated, and in order to remain lazy (and not prematurely perform any sideeffects), we must return our iteratee in a monad:
1 2 3 4 5 6 7 8 9 10 11 

The monad being used here is an IO monad that I’ll explain in a second. The important thing to note is that the iteratee is completely oblivious to the fact that it’s being fed lines from a BufferedReader rather than a List.
Here is the IO monad I’m using. As you can see, it’s really just a lazy identity monad:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

To read lines from a file, we’ll do something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

The enumFile method uses bracketing to ensure that the file always gets closed. It’s completely lazy though, so nothing actually happens until you call unsafePerformIO on the resulting IO action:
1 2 3 4 5 

That uses the “head” iteratee from above to get the first line of the file that I’m using to edit this blog post.
We can get the number of lines in two files combined, by composing two enumerations and using our “counter” iteratee from above:
1 2 3 4 

So what we have here is a uniform and compositional interface for enumerating both pure and effectful data sources. We can avoid holding on to the entire input in memory when we don’t want to, and we have complete control over when to stop iterating. The iteratee can decide whether to consume elements, leave them intact, or even truncate the input. The enumerator can decide whether to shut the iteratee down by sending it the EOF signal, or to leave it open for other enumerators.
There is even more to this approach, as we can use iteratees not just to read from data sources, but also to write to them. That will have to await another post.
]]>One of the great features of modern programming languages is structural pattern matching on algebraic data types. Once you’ve used this feature, you don’t ever want to program without it. You will find this in languages like Haskell and Scala.
In Scala, algebraic types are provided by case classes. For example:
1 2 3 4 

To define operations over this algebraic data type, we use pattern matching on its structure:
1 2 3 4 5 

When I go back to a programming language like, say, Java, I find myself wanting this feature. Unfortunately, algebraic data types aren’t provided in Java. However, a great many hacks have been invented over the years to emulate it, knowingly or not.
What I have used most throughout my career to emulate pattern matching in languages that lack it are a couple of hoary old hacks. These venerable and well respected practises are a pair of design patterns from the GoF book: Interpreter and Visitor.
The Interpreter pattern really does describe an algebraic structure, and it provides a method of reducing (interpreting) the structure. However, there are a couple of problems with it. The interpretation is coupled to the structure, with a “context” passed from term to term, and each term must know how to mutate the context appropriately. That’s minus one point for tight coupling, and minus one for relying on mutation.
The Visitor pattern addresses the former of these concerns. Given an algebraic structure, we can define an interface with one “visit” method per type of term, and have each term accept a visitor object that implements this interface, passing it along to the subterms. This decouples the interpretation from the structure, but still relies on mutation. Minus one point for mutation, and minus one for the fact that Visitor is incredibly crufty. For example, to get the depth of our tree structure above, we have to implement a TreeDepthVisitor. A good IDE that generates boilerplate for you is definitely recommended if you take this approach.
On the plus side, both of these patterns provide some enforcement of the exhaustiveness of the pattern match. For example, if you add a new term type, the Interpreter pattern will enforce that you implement the interpretation method. For Visitor, as long as you remember to add a visitation method for the new term type to the visitor interface, you will be forced to update your implementations accordingly.
An obvious approach that’s often sneered at is runtime type discovery. A quick and dirty way to match on types is to simply check for the type at runtime and cast:
1 2 3 4 5 6 7 8 9 

There are some obvious problems with this approach. For one thing, it bypasses the type system, so you lose any static guarantees that it’s correct. And there’s no enforcement of the exhaustiveness of the matching. But on the plus side, it’s both fast and terse.
There are at least two approaches that we can take to approximate pattern matching in Java more closely than the above methods. Both involve utilising parametric polymorphism and functional style. Let’s consider them in order of increasing preference, i.e. less preferred method first.
The first approach is based on the insight that algebraic data types represent a disjoint union of types. Now, if you’ve done any amount of programming in Java with generics, you will have come across (or invented) the simple pair type, which is a conjunction of two types:
1 2 3 4 

A value of this type can only be created if you have both a value of type A
and a value of type B
. So (conceptually, at least) it has a single constructor that takes two values. The disjunction of two types is a similar idea, except that a value of type Either<A, B>
can be constructed with either a value of type A
or a value of type B
:
1 2 3 4 5 6 

Encoded as a disjoint union type, then, our Tree
data type above is: Either<Empty, Either<Leaf, Node>>
Let’s see that in context. Here’s the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 

The neat thing is that Either<A, B>
can be made to return both Iterable<A>
and Iterable<B>
in methods right()
and left()
, respectively. One of them will be empty and the other will have exactly one element. So our pattern matching function will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 

That’s terse and readable, as well as typesafe. The only issue with this is that the exhaustiveness of the patterns is not enforced, so we’re still only discovering that error at runtime. So it’s not all that much of an improvement over the instanceof approach.
Alonzo Church was a pretty cool guy. Having invented the lambda calculus, he discovered that you could encode data in it. We’ve all heard that every data type can be defined in terms of the kinds of operations that it supports. Well, what Church discovered is much more profound than that. A data type IS a function. In other words, an algebraic data type is not just a structure together with an algebra that collapses the structure. The algebra IS the structure.
Consider the boolean type. It is a disjoint union of True and False. What kinds of operations does this support? Well, you might want to do one thing if it’s True, and another if it’s False. Just like with our Tree, where we wanted to do one thing if it’s a Leaf, and another thing if it’s a Node, etc.
But it turns out that the boolean type IS the condition function. Consider the Church encoding of booleans:
1 2 

So a boolean is actually a binary function. Given two terms, a boolean will yield the former term if it’s true, and the latter term if it’s false. What does this mean for our Tree
type? It too is a function:
1 2 3 

You can see that this gives you pattern matching for free. The Tree
type is a function that takes three arguments:
A value to yield if the tree is empty. A unary function to apply to an integer if it’s a leaf. A binary function to apply to the left and right subtrees if it’s a node. The type of such a function looks like this (Scala notation):
1


Or equivalently:
1


Translated to Java, we need this method on Tree:
1 2 3 

The Function
interface is in the java.util
package in Java 8, but you can definitely make it yourself in previous versions:
1


Now our Tree code looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 

And we can do our pattern matching on the calling side:
1 2 3 4 5 

This is almost as terse as the Scala code, and very easy to understand. Everything is checked by the type system, and we are guaranteed that our patterns are exhaustive. This is an ideal solution.
With some slightly clever use of generics and a little help from our friends Church and Curry, we can indeed emulate structural pattern matching over algebraic data types in Java, to the point where it’s almost as nice as a builtin language feature.
So throw away your Visitors and set fire to your GoF book.
]]>