Higher Order

Philosophy and functional programming.

A Better Reading List With Mathematica

Like a lot of people, I keep a list of books I want to read. And because there are a great many more books that interest me than I can possibly read in my lifetime, this list has become quite long.

In the olden days of brick-and-mortar bookstores and libraries, I would discover books to read by browsing shelves and picking up what looked interesting at the time. I might even find something that I knew was on my list. “Oh, I’ve been meaning to read that!”

The Internet changes this dynamic dramatically. It makes it much easier for me to discover books that interest me, and also to access any book that I might want to read, instantly, anywhere. At any given time, I have a couple of books that I’m “currently reading”, and when I finish one I can start another immediately. I use Goodreads to manage my to-read list, and it’s easy for me to scroll through the list and pick out my next book.

But again, this list is very long. So I wanted a good way to filter out books I will really never read, and sort it such that the most “important” books in some sense show up first. Then every time I need a new book I could take the first one from the list and make a binary decision: either “I will read this right now”, or “I am never reading this”. In the latter case, if a book interests me enough at a later time, I’m sure it will find its way back onto my list.

The problem then is to find a good metric by which to rank books. Goodreads lets users rank books with a star-rating from 1 to 5, and presents you with an average rating by which you can sort the list. The problem is that a lot of books that interest me have only one rating and it’s 5 stars, giving the book an “average” of 5.0. So if I go with that method I will be perpetually reading obscure books that one other person has read and loved. This is not necessarily a bad thing, but I do want to branch out a bit.

Another possibility is to use the number of ratings to calculate a confidence interval for the average rating. For example, using the Wilson score I could find an upper and lower bound s1 and s2 (higher and lower than the average rating, respectively) that will let me say “I am 95% sure that any random sample of readers of an equal size would give an average rating between s1 and s2.” I could then sort the list by the lower bound s1.

But this method is dissatisfactory for a number of reasons. First, it’s not clear how to fit star ratings to such a measure. If we do the naive thing and count a 1-star rating as 1/5 and a 5 star rating as 5/5, that counts a 1-star rating as a “partial success” in some sense. We could discard 1-stars as 0, and count 2, 3, 4, and 5 stars as 25%, 50%, 75%, and 100%, respectively.

But even if we did make it fit somehow, it turns out that if you take any moderately popular book on Goodreads at random, it will have an average rating somewhere close to 4. I could manufacture a prior based on this knowledge and use that instead of the normal distribution or the Jeffreys prior in the confidence interval, but that would still not be a very good ranking because reader review metascores are meaningless.

In the article “Reader review metascores are meaningless”, Stephanie Shun suggests using the percentage of 5-star ratings as the relevant metric rather than the average rating. This is a good suggestion, since even a single 5-star rating carries a lot of actionable information whereas an average rating close to 4.0 carries very little.

I can then use the Wilson score directly, counting a 5-star rating as a successful trial and any other rating as a failed one. I can then just use the normal distribution instead of working with an artisanally curated prior.

Mathematica makes it easy to generate the Wilson score. Here, pos is the number of positive trials (number of 5-star ratings), n is the number of total ratings, and confidence is the desired confidence percentage. I’m taking the lower bound of the confidence interval to get my score.

1
2
3
4
5
6
7
8
9
10
Wilson[pos_, n_, confidence_] := 
  If[n == 0 || n < pos, 0, 
    With[{
      z = Quantile[NormalDistribution[], 1 - (1 - confidence)/2], 
      p = pos/n
    },
    (-(z Sqrt[(z^2/(4 n) + (1 - p) p)/n]) + z^2/(2 n) + p) /
      (z^2/n + 1)
    ]
  ]

Now I just need to get the book data from Goodreads. Fortunately, it has a pretty rich API. I just need a developer key, which anyone can get for free.

For example, to get the ratings for a given book id, we can use their XML api for books and pattern match on the result to get the ratings by score:

1
2
3
4
5
6
7
8
9
10
Ratings[id_] := Ratings[id] = 
  With[{
    raw = Cases[Import[ToString[StringForm[
        "http://www.goodreads.com/book/show.xml?key=``&id=``", key, id]]],
      XMLElement["rating_dist", _, {d_}] -> d, Infinity]
  }, 
  With[{
    data = StringSplit[#, ":"] & /@ StringSplit[raw, "|"]
  }, Block[{}, Pause[1];
  FromDigits /@ Association[Rule @@@ First[data]]]]];

Here, key is my Goodreads developer API key, defined elsewhere. I put a Pause[1] in the call since Goodreads throttles API calls so you can’t make more than one call per second to each API endpoint. I’m also memoizing the result, by assigning to Ratings[id] in the global environment.

Ratings will give us an association list with the number of ratings for each score from 1 to 5, together with the total. For example, for the first book in their catalogue, Harry Potter and the Half-Blood Prince, here are the scores:

1
2
3
4
5
6
7
8
In[1]:= Ratings[1]

Out[1]= <|"5"     ->  782740,
          "4"     ->  352725,
          "3"     ->  111068,
          "2"     ->   17693,
          "1"     ->    5289,
          "total" -> 1269515|>

Sweet. Let’s see how Harry Potter #6 would score with our rating:

1
2
3
In[2]:= Wilson[Out[1]["5"], Out[1]["total"], 0.95]

Out[2]= 0.61572

So Wilson is 95% confident that in any random sample of about 1.2 million Harry Potter readers, at least 61.572% of them would give The Half-Blood Prince a 5-star rating. That turns out to be a pretty high score, so if this book were on my list (which it isn’t), it would feature pretty close to the very top.

But now the score for a relatively obscure title is too low. For example, the lower bound of the 95% confidence interval for a single-rating 5-star book will be 0.206549, which will be towards the bottom of any list. This means I would never get to any of the obscure books on my reading list, since they would be edged out by moderately popular books with an average rating close to 4.0.

See, if I’ve picked a book that I want to read, I’d consider five ratings that are all five stars a much stronger signal than the fact that people who like Harry Potter enough to read 5 previous books loved the 6th one. Currently the 5*5 book will score 57%, a bit weaker than the Potter book’s 62%.

I can fix this by lowering the confidence level. Because honestly, I don’t need a high confidence in the ranking. I’d rather err on the side of picking up a deservedly obscure book than to miss out on a rare gem. Experimenting with this a bit, I find that a confidence around 80% raises the obscure books enough to give me an interesting mix. For example, a 5*5 book gets a 75% rank, while the Harry Potter one stays at 62%.

I’m going to call that, the lower bound of the 80% Wilson confidence interval, the Rúnar Rank of a given book:

1
2
RunarRank[id_] := With[{ratings = Ratings[id]},
  Wilson[ratings["5"], ratings["total"], .8]]]

Unfortunately, there’s no way to get the rank of all the books in my reading list in one fell swoop. I’ll have to get the reading list first, then call RunarRank for each book’s id. In Goodreads, books are managed by “shelves”, and the API allows getting the contents of a given shelf, 200 books at a time:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GetShelf[user_, shelf_] :=
  With[{raw = 
    Cases[Import[
      ToString[
       StringForm[
        "http://www.goodreads.com/review/list.xml?v=2&key=``&id=``&\
shelf=``&sort=avg_rating&order=d&per_page=200", key, user, shelf]]], 
     XMLElement[
       "book", _, {___, XMLElement["id", ___, {id_}], ___, 
        XMLElement["title", _, {title_}], ___, 
        XMLElement["average_rating", _, {avg_}], ___, 
        XMLElement[
         "authors", _, {XMLElement[
           "author", _, {___, 
            XMLElement[
             "name", _, {author_}], ___}], ___}], ___}] -> {"id" -> 
        id, "title" -> title, "author" -> author, "avg" -> avg}, 
     Infinity]}, Association /@ raw]

I’m doing a bunch of XML pattern matching here to get the id, title, average_rating, and first author of each book. Then I put that in an association list. I’m getting only the top-200 books on the list by average rating (which currently is about half my list).

With that in hand, I can get the contents of my “to-read” shelf with GetShelf[runar, "to-read"], where runar is my Goodreads user id. And given that, I can call RunarRank on each book on the shelf, then sort the result by that rank:

1
2
3
RunarSort[shelf_] :=
 Sort[Map[Function[x, Append[x, "rank" -> RunarRank[x["id"]]]], 
   shelf], #1["rank"] > #2["rank"] &]

To get the ranked reading list of any user:

1
ReadingList[user_] := RunarSort[GetShelf[user, "to-read"]]

And to print it out nicely:

1
2
3
4
5
6
7
Gridify[books_] := 
 Grid[Flatten[
   Cases[books, 
    b_ -> {
      {b["id"], b["title"], UnitConvert[b["rank"], "Percent"]},
      {"", b["author"], b["avg"]}, {"", "", ""} }], 1],
   Alignment -> Left]

Now I can get, say, the first 10 books on my improved reading list:

1
Gridify[ReadingList[runar][[1 ;; 10]]]
9934419 Kvæðasafn 75.2743%
Snorri Hjartarson 5.00
 
17278 The Feynman Lectures on Physics Vol 1 67.2231%
Richard P. Feynman 4.58
 
640909 The Knowing Animal: A Philosophical Inquiry Into Knowledge and Truth 64.6221%
Raymond Tallis 5.00
 
640913 The Hand: A Philosophical Inquiry Into Human Being 64.6221%
Raymond Tallis 5.00
 
4050770 Volition As Cognitive Self Regulation 62.231%
Harry Binswanger 4.86
 
8664353 Unbroken: A World War II Story of Survival, Resilience, and Redemption 60.9849%
Laura Hillenbrand 4.45
 
13413455 Software Foundations 60.1596%
Benjamin C. Pierce 4.80
 
77523 Harry Potter and the Sorcerer’s Stone (Harry Potter #1) 59.1459%
J.K. Rowling 4.39
 
13539024 Free Market Revolution: How Ayn Rand’s Ideas Can End Big Government 59.1102%
Yaron Brook 4.48
 
1609224 The Law 58.767%
Frédéric Bastiat 4.40
 

I’m quite happy with that. Some very popular and well-loved books interspersed with obscure ones with exclusively (or almost exclusively) positive reviews. The most satisfying thing is that the rating carries a real meaning. It’s basically the relative likelihood that I will enjoy the book enough to rate it five stars.

I can test this ranking against books I’ve already read. Here’s the top of my “read” shelf, according to their Rúnar Rank:

17930467 The Fourth Phase of Water 68.0406%
Gerald H. Pollack 4.85
 
7687279 Nothing Less Than Victory: Decisive Wars and the Lessons of History 64.9297%
John David Lewis 4.67
 
43713 Structure and Interpretation of Computer Programs 62.0211%
Harold Abelson 4.47
 
7543507 Capitalism Unbound: The Incontestable Moral Case for Individual Rights 57.6085%
Andrew Bernstein 4.67
 
13542387 The DIM Hypothesis: Why the Lights of the West Are Going Out 55.3296%
Leonard Peikoff 4.37
 
5932 Twenty Love Poems and a Song of Despair 54.7205%
Pablo Neruda 4.36
 
18007564 The Martian 53.9136%
Andy Weir 4.36
 
24113 Gödel, Escher, Bach: An Eternal Golden Braid 53.5588%
Douglas R. Hofstadter 4.29
 
19312 The Brothers Lionheart 53.0952%
Astrid Lindgren 4.33
 
13541678 Functional Programming in Scala 52.6902%
Rúnar Bjarnason 4.54
 

That’s perfect. Those are definitely books I thouroughly enjoyed and would heartily recommend. Especially that last one.

Maximally Powerful, Minimally Useful

It’s well known that there is a trade-off in language and systems design between expressiveness and analyzability. That is, the more expressive a language or system is, the less we can reason about it, and vice versa. The more capable the system, the less comprehensible it is.

This principle is very widely applicable, and it’s a useful thing to keep in mind when designing languages and libraries. A practical implication of being aware of this principle is that we always make components exactly as expressive as necessary, but no more. This maximizes the ability of any downstream systems to reason about our components. And dually, for things that we receive or consume, we should require exactly as much analytic power as necessary, and no more. That maximizes the expressive freedom of the upstream components.

I find myself thinking about this principle a lot lately, and seeing it more or less everywhere I look. So I’m seeking a more general statement of it, if such a thing is possible. It seems that more generally than issues of expressivity/analyzability, a restriction at one semantic level translates to freedom and power at another semantic level.

What I want to do here is give a whole bunch of examples. Then we’ll see if we can come up with an integration for them all. This is all written as an exercise in thinking out loud and is not to be taken very seriously.

Examples from computer science

Context-free and regular grammars

In formal language theory, context-free grammars are more expressive than regular grammars. The former can describe strictly more sets of strings than the latter. On the other hand, it’s harder to reason about context-free grammars than regular ones. For example, we can decide whether two regular expressions are equal (they describe the same set of strings), but this is undecidable in general for context-free grammars.

Monads and applicative functors

If we know that an applicative functor is a monad, we gain some expressive power that we don’t get with just an applicative functor. Namely, a monad is an applicative functor with an additional capability: monadic join (or “bind”, or “flatMap”). That is, context-sensitivity, or the ability to bind variables in monadic expressions.

This power comes at a cost. Whereas we can always compose any two applicatives to form a composite applicative, two monads do not in general compose to form a monad. It may be the case that a given monad composes with any other monad, but we need some additional information about it in order to be able to conclude that it does.

Actors and futures

Futures have an algebraic theory, so we can reason about them algebraically. Namely, they form an applicative functor which means that two futures x and y make a composite future that does x and y in parallel. They also compose sequentially since they form a monad.

Actors on the other hand have no algebraic theory and afford no algebraic reasoning of this sort. They are “fire and forget”, so they could potentially do anything at all. This means that actor systems can do strictly more things in more ways than systems composed of futures, but our ability to reason about such systems is drastically diminished.

Typed and untyped programming

When we have an untyped function, it could receive any type of argument and produce any type of output. The implementation is totally unrestricted, so that gives us a great deal of expressive freedom. Such a function can potentially participate in a lot of different expressions that use the function in different ways.

A function of type Bool -> Bool however is highly restricted. Its argument can only be one of two things, and the result can only be one of two things as well. So there are 4 different implementations such a function could possibly have. Therefore this restriction gives us a great deal of analyzability.

For example, since the argument is of type Bool and not Any, the implementation mostly writes itself. We need to consider only two possibilities. Bool (a type of size 2) is fundamentally easier to reason about than Any (a type of potentially infinite size). Similarly, any usage of the function is easy to reason about. A caller can be sure not to call it with arguments other than True or False, and enlist the help of a type system to guarantee that expressions involving the function are meaningful.

Total functional programming

Programming in non-total languages affords us the power of general recursion and “fast and loose reasoning” where we can transition between valid states through potentially invalid ones. The cost is, of course, the halting problem. But more than that, we can no longer be certain that our programs are meaningful, and we lose some algebraic reasoning. For example, consider the following:

1
map (- n) (map (+ n) xs)) == xs

This states that adding n to every number in a list and then subtracting n again should be the identity. But what if n actually throws an exception or never halts? In a non-total language, we need some additional information. Namely, we need to know that n is total.

Referential transparency and side effects

The example above also serves to illustrate the trade-off between purely functional and impure programming. If n could have arbitrary side effects, algebraic reasoning of this sort involving n is totally annihilated. But if we know that n is referentially transparent, algebraic reasoning is preserved. The power of side effects comes at the cost of algebraic reasoning. This price includes loss of compositionality, modularity, parallelizability, and parametricity. Our programs can do strictly more things, but we can conclude strictly fewer things about our programs.

Example from infosec

There is a principle in computer security called The Principle of Least Privilege. It says that a user or program should have exactly as much authority as necessary but no more. This constrains the power of the entity, but greatly enhances the power of others to predict and reason about what the entity is going to do, resulting in the following benefits:

  • Compositionality – The fewer privileges a component requires, the easier it is to deploy inside a larger environment. For the purposes of safety, higher privileges are a barrier to composition since a composite system requires the highest privileges of any of its components.
  • Modularity – A component with restricted privileges is easier to reason about in the sense that its interaction with other components will be limited. We can reason mechanically about where this limit actually is, which gives us better guarantees about the the security and stability of the overall system. A restricted component is also easier to test in isolation, since it can be run inside an overall restricted environment.

Example from politics

Some might notice an analogy between the Principle of Least Privilege and the idea of a constitutionally limited government. An absolute dictatorship or pure democracy will have absolute power to enact whatever whim strikes the ruler or majority at the moment. But the overall stability, security, and freedom of the people is greatly enhanced by the presence of legal limits on the power of the government. A limited constitutional republic also makes for a better neighbor to other states.

More generally, a ban on the initiation of physical force by one citizen against another, or by the government against citizens, or against other states, makes for a peaceful and prosperous society. The “cost” of such a system is the inability of one person (or even a great number of people) to impose their preferences on others by force.

An example from mathematics

The framework of two-dimensional Euclidean geometry is simply an empty page on which we can construct lines and curves using tools like a compass and straightedge. When we go from that framework to a Cartesian one, we constrain ourselves to reasoning on a grid of pairs of numbers. This is a tradeoff between expressivity and analyzability. When we move fom Euclidean to Cartesian geometry, we lose the ability to assume isotropy of space, intersection of curves, and compatibility between dimensions. But we gain much more powerful things through the restriction: the ability to precisely define geometric objects, to do arithmetic with them, to generalize to higher dimensions, and to reason with higher abstractions like linear algebra and category theory.

Examples from everyday life

Driving on roads

Roads constrain the routes we can take when we drive or walk. We give up moving in a straight line to wherever we want to go. But the benefit is huge. Roads let us get to where we’re going much faster and more safely than we would otherwise.

Commodity components

Let’s say you make a decision to have only one kind of outfit that you wear on a daily basis. You just go out and buy multiple identical outfits. Whereas you have lost the ability to express yourself by the things you wear, you have gained a certain ability to reason about your clothing. The system is also fault-tolerant and compositional!

Summary

What is this principle? Here are some ways of saying it:

  • Things that are maximally general for first-order applications are minimally useful for higher-order applications, and vice versa.
  • A language that is maximally expressive is minimally analyzable.
  • A simplifying assumption at one semantic level paves the way to a richer structure at a higher semantic level.

What do you think? Can you think of a way to integrate these examples into a general principle? Do you have other favorite examples of this principle in action? Is this something everyone already knows about and I’m just late to the party?

Taking a Break From Twitter

Today I disabled my Twitter account. Probably only temporarily, but we’ll see. This is just a quick note to let everyone know that I’m not leaving Twitter because of anything specific. It’s not that you said or did anything wrong. I just find that it’s becoming an energy sink for me. I’m putting Twitter away as a measure to control my focus, and to better control what information I consume and produce.

Hopefully this means that I will post more here when I have something interesting to share. If you need to reach me, I’m available by email. My address is runar at this blog’s domain.

(UPDATE 2014-12-21): I’ve recreated my Twitter account, but I’m now using the service in a different way. The biggest change is that I don’t follow anyone. It’s strictly a broadcasting device for me, and not an information consumption device.

At Long Last

In early September 2014, we published Functional Programming in Scala. It is now available from all major booksellers, and from the publisher at manning.com/bjarnason. It’s available as a beautiful paper book, on Kindle and other e-book readers, and as a PDF file.

I just want to share my personal story of how this book came to exist. A much shorter version of this story became the preface for the finished book, but here is the long version.

Around 2006 I was working in Austin and coming up on my 8th anniversary as an enterprise Java programmer. I had started to notice that I was making a lot of the same mistakes over and over again. I had a copy of the Gang of Four’s Design Patterns on my desk that I referred to frequently, and I built what I thought were elegant object-oriented designs. Every new project started out well enough, but became a big ball of mud after a while. My once-elegant class hierarchies gathered bugs, technical debt, and unimplemented features. Making changes started to feel like trudging through a swamp. I was never confident that I wasn’t introducing defects as I went. My code was difficult to test or reuse, and impossible to reason about. My productivity plummeted, and a complete rewrite became inevitable. It was a vicious cycle.

In looking for a more disciplined approach, I came across Haskell and functional programming. Here was a community of people with a sound methodology for reasoning about their programs. In other words, they actually knew what they were doing. I found a lot of good ideas and proceeded to import them to Java. A little later I met Tony Morris, who had been doing the same, on IRC. He told me about this new JVM language, Scala. Tony had a library called Scalaz (scala-zed) that made FP in Scala more pleasant, and I started contributing to that library. One of the other people contributing to Scalaz was Paul Chiusano, who was working for a company in Boston. In 2008 he invited me to come work with him, doing Scala full time. I sold my house and everything in it, and moved to Boston.

Paul co-organized the Boston Area Scala Enthusiasts, a group that met monthly at Google’s office in Cambridge. It was a popular group, mainly among Java programmers who were looking for something better. But there was a clear gap between those who had come to Scala from an FP perspective and those who saw Scala as just a better way to write Java. In April 2010 another of the co-organizers, Nermin Serifovic, said he thought there was “tremendous demand” for a book that would bridge that gap, on the topic of functional programming in Scala. He suggested that Paul and I write that book. We had a very clear idea of the kind of book we wanted to write, and we thought it would be quick and easy. More than four years later, I think we have made a good book.

Paul and I hope to convey in this book some of the excitement that we felt when we were first discovering FP. It’s encouraging and empowering to finally feel like we’re writing comprehensible software that we can reuse and confidently build upon. We want to invite you to the world of programming as it could be and ought to be.

– Rúnar Óli Bjarnason

Boston, August 2014

Monoid Morphisms, Products, and Coproducts

Today I want to talk about relationships between monoids. These can be useful to think about when we’re developing libraries involving monoids, and we want to express some algebraic laws among them. We can then check these with automated tests, or indeed prove them with algebraic reasoning.

This post kind of fell together when writing notes on chapter 10, “Monoids”, of Functional Programming in Scala. I am putting it here so I can reference it from the chapter notes at the end of the book.

Monoid homomorphisms

Let’s take the String concatenation and Int addition as example monoids that have a relationship. Note that if we take the length of two strings and add them up, this is the same as concatenating those two strings and taking the length of the combined string:

1
(a.length + b.length) == (a + b).length

So every String maps to a corresponding Int (its length), and every concatenation of strings maps to the addition of corresponding lengths.

The length function maps from String to Int while preserving the monoid structure. Such a function, that maps from one monoid to another in such a preserving way, is called a monoid homomorphism. In general, for monoids M and N, a homomorphism f: M => N, and all values x:M, y:M, the following law holds:

1
f(x |+| y) == (f(x) |+| f(y))

The |+| syntax is from Scalaz and is obtained by importing scalaz.syntax.monoid._. It just references the append method on the Monoid[T] instance, where T is the type of the arguments.

This law can have real practical benefits. Imagine for example a “result set” monoid that tracks the locations of a particular set of records in a database or file. This could be as simple as a Set of locations. Concatenating several thousand files and then proceeding to search through them is going to be much slower than searching through the files individually and then concatenating the result sets. Particularly since we can potentially search the files in parallel. A good automated test for our result set monoid would be that it admits a homomorphism from the data file monoid.

Monoid isomorphisms

Sometimes there will be a homomorphism in both directions between two monoids. If these are inverses of one another, then this kind of relationship is called a monoid isomorphism and we say that the two monoids are isomorphic. More precisely, we will have two monoids A and B, and homomorphisms f: A => B and g: B => A. If f(g(b)) == b and g(f(a)) == a, for all a:A and b:B then f and g form an isomorphism.

For example, the String and List[Char] monoids with concatenation are isomorphic. We can convert a String to a List[Char], preserving the monoid structure, and go back again to the exact same String we started with. This is also true in the inverse direction, so the isomorphism holds.

Other examples include (Boolean, &&) and (Boolean, ||) which are isomorphic via not.

Note that there are monoids with homomorphisms in both directions between them that nevertheless are not isomorphic. For example, (Int, *) and (Int, +). These are homomorphic to one another, but not isomorphic (thanks, Robbie Gates).

Monoid products and coproducts

If A and B are monoids, then (A,B) is certainly a monoid, called their product:

1
2
3
4
5
6
def product[A:Monoid,B:Monoid]: Monoid[(A, B)] =
  new Monoid[(A, B)] {
    def append(x: (A, B), y: => (A, B)) =
      (x._1 |+| y._1, x._2 |+| y._2)
    val zero = (mzero[A], mzero[B])
  }

But is there such a thing as a monoid coproduct? Could we just use Either[A,B] for monoids A and B? What would be the zero of such a monoid? And what would be the value of Left(a) |+| Right(b)? We could certainly choose an arbitrary rule, and we may even be able to satisfy the monoid laws, but would that mean we have a monoid coproduct?

To answer this, we need to know the precise meaning of product and coproduct. These come straight from Wikipedia, with a little help from Cale Gibbard.

A product M of two monoids A and B is a monoid such that there exist homomorphisms fst: M => A, snd: M => B, and for any monoid Z and morphisms f: Z => A and g: Z => B there has to be a unique homomorphism h: Z => M such that fst(h(z)) == f(z) and snd(h(z)) == g(z) for all z:Z. In other words, the following diagram must commute:

A coproduct W of two monoids A and B is the same except the arrows are reversed. It’s a monoid such that there exist homomorphisms left: A => W, right: B => W, and for any monoid Z and morphisms f: A => Z and g: B => Z there has to be a unique homomorphism h: W => Z such that h(left(a)) == f(a) and h(right(b)) == g(b) for all a:A and all b:B. In other words, the following diagram must commute:

We can easily show that our productMonoid above really is a monoid product. The homomorphisms are the methods _1 and _2 on Tuple2. They simply map every element of (A,B) to a corresponding element in A and B. The monoid structure is preserved because:

1
2
(p |+| q)._1 == (p._1 |+| q._1)
(p |+| q)._2 == (p._2 |+| q._2)

And for any other monoid Z, and morphisms f: Z => A and g: Z => B, we can construct a unique morphism from Z to (A,B):

1
2
def factor[A:Monoid,B:Monoid,Z:Monoid](z: Z)(f: Z => A, g: Z => B): (A, B) =
  (f(z), g(z))

And this really is a homomorphism because we just inherit the homomorphism law from f and g.

What does a coproduct then look like? Well, it’s going to be a type C[A,B] together with an instance coproduct[A:Monoid,B:Monoid]:Monoid[C[A,B]]. It will be equipped with two monoid homomorphisms, left: A => C[A,B] and right: B => C[A,B] that satisfy the following (according to the monoid homomorphism law):

1
2
(left(a1) |+| left(a2)) == left(a1 |+| a2)
(right(b1) |+| right(b2)) == right(b1 |+| b2)

And additionally, for any other monoid Z and homomorphisms f: A => Z and g: B => Z we must be able to construct a unique homomorphism from C[A,B] to Z:

1
def fold[A:Monoid,B:Monoid,Z:Monoid](c: C[A,B])(f: A => Z, g: B => Z): Z

The simplest thing that could possibly work

We can simply come up with a data structure required for the coproduct to satisfy a monoid. We will start with two constructors, one for the left side, and another for the right side:

1
2
3
sealed trait These[+A,+B]
case class This[A](a: A) extends These[A,Nothing]
case class That[B](b: B) extends These[Nothing,B]

This certainly allows an embedding of both monoids A and B. But These is now basically Either, which we know doesn’t quite form a monoid. What’s needed is a zero, and a way of appending an A to a B. The simplest way to do that is to add a product constructor to These:

1
case class Both[A,B](a: A, b: B) extends These[A,B]

Now These[A,B] is a monoid as long as A and B are monoids:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def coproduct[A:Monoid,B:Monoid]: Monoid[These[A, B]] =
  new Monoid[These[A, B]] {
    def append(x: These[A, B], y: These[A, B]) = (x, y) match {
      case (This(a1), This(a2)) => This(a1 |+| a2)
      case (That(b1), That(b2)) => That(b1 |+| b2)
      case (That(b), This(a)) => Both(a, b)
      case (This(a), That(b)) => Both(a, b)
      case (Both(a1, b), This(a)) => Both(a1 |+| a, b)
      case (Both(a, b1), That(b)) => Both(a, b1 |+| b)
      case (This(a1), Both(a, b)) => Both(a1 |+| a, b)
      case (That(b), Both(a, b1)) => Both(a, b1 |+| b)
      case (Both(a1, b1), Both(a2, b2)) => Both(a1 |+| a2, b1 |+| b2)
    }
    val zero = Both(A.zero, B.zero)
  }

These[A,B] is the smallest monoid that contains both A and B as submonoids (the This and That constructors, respectively) and admits a homomorphism from both A and B. And notice that we simply added the least amount of structure possible to make These[A,B] a monoid (the Both constructor). But is it really a coproduct?

First we must prove that This and That really are homomorphisms. We need to prove the following two properties:

1
2
(This(a1) |+| This(a2)) == This(a1 |+| a2)
(That(b1) |+| That(b2)) == That(b1 |+| b2)

That’s easy. The first two cases of the append method on the coproduct monoid prove these properties.

But can we define fold? Yes we can:

1
2
3
4
5
6
7
def fold[A:Monoid,B:Monoid,Z:Monoid](
  these: These[A,B])(f: A => Z, g: B => Z): Z =
    these match {
      case This(a) => f(a)
      case That(b) => g(b)
      case Both(a, b) => f(a) |+| g(b)
    }

But is fold really a homomorphism? Let’s not assume that it is, but test it out.Here’s the homomorphism law:

1
fold(f,g)(t1 |+| t2) == fold(f,g)(t1) |+| fold(f,g)(t2)

What happens if both t1 or t2 are This or That?

1
2
3
4
fold(f,g)(This(a1) |+| This(a2)) ==
fold(f,g)(This(a1)) |+| fold(f, g)(This(a2))

f(a1) |+| f(a2) == f(a1) |+| f(a2)

That holds. But what if we introduce a Both on one side?

1
2
3
4
fold(f,g)(This(a1) |+| Both(a2, b)) ==
fold(f,g)(This(a1)) |+| fold(f,g)(Both(a2,b))

f(a1 |+| a2) |+| g(b) == f(a1) |+| f(a2) |+| g(b)

So far so good. That holds because of associativity. What about the other side?

1
2
3
4
fold(f,g)(That(b1) |+| Both(a, b2)) ==
fold(f,g)(That(b1)) |+| fold(f,g)(Both(a,b2))

f(a) |+| g(b1 |+| b2) == g(b1) |+| f(a) |+| g(b2)

No! Something has gone wrong. This will only hold if the Z monoid is commutative. So in general, These[A,B] is not the coproduct of A and B. My error was in the Both constructor, which commutes all B values to the right and all A values to the left.

So what kind of thing would work? It would have to solve this case:

1
  case (Both(a1, b1), Both(a2, b2)) => ???

We need to preserve that a1, b1, a2, and b2 appear in that order. So clearly the coproduct will be some kind of list!

We could modify the Both constructor this way:

1
case class Both[A,B](a: These[A,B], b: These[A,B]) extends These[A,B]

And let’s add an empty case for the combined zero:

1
case class Neither[A,B]() extends These[A,B]

In which case These[A,B] has become a kind of tree, or an unbalanced list of This[A] and That[B] values. A free product of the two monoids A and B. The implementation of append for the coproduct monoid could be:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def coproduct[A:Monoid,B:Monoid]: Monoid[These[A, B]] =
  new Monoid[These[A, B]] {
    def append(x: These[A, B], y: These[A, B]) = (x, y) match {
      case (Neither(), _) => y
      case (_, Neither()) => x
      case (This(a1), This(a2)) => This(a1 |+| a2)
      case (That(b1), That(b2)) => That(b1 |+| b2)
      case (This(a1), Both(This(a2), z)) => Both(This(a1 |+| a2), z)
      case (That(b1), Both(That(b2), z)) => Both(That(b1 |+| b2), z)
      case (Both(z, This(a1)), This(a2)) => Both(z, This(a1 |+| a2))
      case (Both(z, That(b1)), That(b2)) => Both(z, That(b1 |+| b2))
      case _ => Both(x, y)
    }
    val zero = Neither[A,B]()
  }

This append normalizes the list so that consecutive values of the same type are added together.

And we would modify fold to recurse over the tree:

1
2
3
4
5
6
7
def fold[Z:Monoid](these: These[A,B])(f: A => Z, g: B => Z): Z =
  these match {
    case Neither() => mzero[Z]
    case This(a) => f(a)
    case That(b) => g(b)
    case Both(a, b) => fold(a)(f, g) |+| fold(b)(f, g)
  }

This is now a homomorphism! We already know that this is so for the This and That cases. And now that the Both case simply appeals to the inductive hypothesis, we know that it holds for Both as well.

Free monoids on coproducts

To better understand what’s going on, let’s try going the other way. What if we start with the coproduct of the underlying sets and get a free monoid from there?

The underlying set of a monoid A is just the type A without the monoid structure. The coproduct of types A and B is the type Either[A,B]. Having “forgotten” the monoid structure of both A and B, we can recover it by generating a free monoid on Either[A,B], which is just List[Either[A,B]]. The append operation of this monoid is list concatenation, and the identity for it is the empty list.

Clearly List[Either[A,B]] is a monoid, but does it permit a homomorphism from both monoids A and B? If so, then the following properties should hold:

1
2
List(Left(a1)) ++ List(Left(a2))) == List(Left(a1 |+| a2))
List(Right(b1)) ++ List(Right(b2))) == List(Right(b1 |+| b2))

They clearly do not hold! The lists on the left of == will have two elements and the lists on the right will have one element. Can we do something about this?

Well, the fact is that List[Either[A,B]] is not exactly the monoid coproduct of A and B. It’s “too big” in a sense. But if we were to reduce the list to a normal form that approximates a “free product”, we can get a coproduct that matches our definition above. What we need is a new monoid:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
sealed class Eithers[A,B](val toList: List[Either[A,B]]) {
  def ++(p: Eithers[A,B]): Eithers[A,B] =
    Eithers(toList ++ p.toList)
}

object Eithers {
  def apply[A:Monoid,B:Monoid](xs: List[Either[A,B]]): Eithers[A,B] =
    xs.foldRight(List[Either[A,B]]()) {
      case (Left(a1), Left(a2) :: xs) => Left(a1 |+| a2) :: xs
      case (Right(b1), Right(b2) :: xs) => Right(b1 |+| b2) :: xs
      case (e, xs) => e :: xs
    }
  def empty[A,B]: Eithers[A,B] = new Eithers(Nil)
}

Eithers[A,B] is a kind of List[Either[A,B]] that has been normalized so that consecutive As and consecutive Bs have been collapsed using their respective monoids. So it will contain alternating A and B values.

This is now a monoid coproduct because it permits monoid homomorphisms from A and B:

1
2
3
4
Eithers(List(Left(a1))) ++ Eithers(List(Left(a2)))) ==
  Eithers(List(Left(a1 |+| a2)))
Eithers(List(Right(b1))) ++ Eithers(List(Right(b2)))) ==
  Eithers(List(Right(b1 |+| b2)))

And we can implement the fold homomorphism:

1
2
3
4
5
6
def fold[A:Monoid,B:Monoid,Z:Monoid](
  es: Eithers[A,B])(f: A => Z, g: B => Z): Z =
    es.toList.foldRight(mzero[Z]) {
      case (Left(a), z) => f(a) |+| z
      case (Right(b), z) => g(b) |+| z
    }

And this time fold really is a homomorphism, and we can prove it by case analysis. Here’s the law again:

1
(fold(e1)(f,g) |+| fold(e2)(f,g)) == fold(e1 ++ e2)(f,g)

If either of e1 or e2 is empty then the result is the fold of the other, so those cases are trivial. If they are both nonempty, then they will have one of these forms:

1
2
3
4
5
6
7
8
9
10
11
e1 = [..., Left(a1)]
e2 = [Left(a2), ...]

e1 = [..., Right(b1)]
e2 = [Right(b2), ...]

e1 = [..., Left(a)]
e2 = [Right(b), ...]

e1 = [..., Right(b)]
e2 = [Left(a), ...]

In the first two cases, on the right of the == sign in the law, we perform a1 |+| a2 and b1 |+| b2 respectively before concatenating. In the other two cases we simply concatenate the lists. The ++ method on Eithers takes care of doing this correctly for us. On the left of the == sign we fold the lists individually and they will be alternating applications of f and g. So then this law amounts to the fact that f(a1 |+| a2) == f(a1) |+| f(a2) in the first case, and the same for g in the second case. In the latter two cases this amounts to a homomorphism on List. So as long as f and g are homomorphisms, so is fold(_)(f,g). Therefore, Eithers[A,B] really is a coproduct of A and B.

Since we have already convinced ourselves that These is a monoid coproduct, we could also simply show that there is a homomorphism from Eithers to These:

1
2
3
4
5
6
def toThese[A:Monoid,B:Monoid](es: Eithers[A,B]): These[A,B] =
  es match {
    case Nil => Neither()
    case Left(a) :: t => This(a) |+| toThese(t)
    case Right(b) :: t => That(b) |+| toThese(t)
  }

Which would amount to proving that (toThese(xs) |+| toThese(ys)) = toThese(xs ++ ys).

The lesson learned here is to check assumptions and test against laws. Things are not always as straightforward as they seem.

Free Monads and the Yoneda Lemma

Last week I gave a talk on Purely Functional I/O at Scala.io in Paris. The slides for the talk are available here. In it I presented a data type for IO that is supposedly a “free monad”. But the monad I presented is not exactly the same as scalaz.Free and some people have been asking me why there is a difference and what that difference means.

IO as an application of Free

The Free monad in Scalaz is given a bit like this:

1
2
3
sealed trait Free[F[_],A]
case class Return[F[_],A](a: A) extends Free[F,A]
case class Suspend[F[_],A](s: F[Free[F, A]]) extends Free[F,A]

And throughout the methods on Free, it is required that F is a functor because in order to get at the recursive step inside a Suspend, we need to map over the F somehow.

But the IO monad I gave in the talk looks more like this:

1
2
3
sealed trait IO[F[_],A]
case class Return[F[_],A](a: A) extends IO[F,A]
case class Req[F[_],I,A](req: F[I], k: I => IO[F,A]) extends IO[F,A]

And it could actually be stated as an application of Free:

1
type IO[F[_],A] = Free[({type λ[α] = (F[I], I => α) forSome {type I}})#λ, A]

So in a very superficial sense, this is how the IO monad relates to Free. The monad IO[F,_] for a given F is the free monad generated by the functor (F[I], I => _) for some type I. And do note that this is a functor no matter what F is.

IO as equivalent to Free

There is a deeper sense in which IO and Free are actually equivalent (more precisely, isomorphic). That is, there exists a transformation from one to the other and back again. Since the only difference between IO and Free is in the functors F[_] vs ∃I. (F[I], I => _), we just have to show that these two are isomorphic for any F.

The Yoneda lemma

There is an important result in category theory known as the Yoneda lemma. What it says is that if you have a function defined like this…

1
def map[B](f: A => B): F[B]

…then you certainly have a value of type F[A]. All you need is to pass the identity function to map in order to get the value of type F[A] out of this function. In fact, a function like this is in practice probably defined as a method on a value of type F[A] anyway. This also means that F is definitely a functor.

The Yoneda lemma says that this goes the other way around as well. If you have a value of type F[A] for any functor F and any type A, then you certainly have a map function with the signature above.

In scala terms, we can capture this in a type:

1
2
3
trait Yoneda[F[_], A] {
  def map[B](f: A => B): F[B]
}

And the Yoneda lemma says that there is an isomorphism between Yoneda[F,A] and F[A], for any functor F and any type A. Here is the proof:

1
2
3
4
5
6
7
import scalaz._

def toYo[F[_]:Functor,A](fa: F[A]) = new Yoneda[F,A] {
  def map[B](f: A => B) = Functor[F].map(fa)(f)
}

def froYo[F[_],A](yo: Yoneda[F,A]) = yo.map(a => a)

CoYoneda

Of course, this also means that if we have a type B, a function of type (B => A) for some type A, and a value of type F[B] for some functor F, then we certainly have a value of type F[A]. This is kind of obvious, since we can just pass the B => A and the F[B] to the map function for the functor and get our F[A].

But the opposite is also true, and that is the really interesting part. If we have a value of type F[A], for any F and A, then we can always destructure it into a value of type F[B] and a function of type B => A, at least for some type B. And it turns out that we can do this even if F is not a functor.

This is the permutation of the Yoneda lemma that we were using in IO above. That is, IO[F, A] is really Free[({type λ[α] = CoYoneda[F,α]})#λ, A], given:

1
2
3
4
5
trait CoYoneda[F[_], A] {
  type I
  def f: I => A
  def fi: F[I]
}

And the lemma says that CoYoneda[F,A] is isomorphic to F[A]. Here is the proof:

1
2
3
4
5
6
def toCoYo[F[_],A](fa: F[A]) = new CoYoneda[F,A] {
  type I = A
  val f = (a: A) => a
  val fi = fa
}
def froCoYo[F[_]:Functor,A](yo: CoYo) = Functor[F].map(yo.fi)(yo.f)

Of course, this destructuring into CoYoneda using the identity function is the simplest and most general, but there may be others for specific F and A depending on what we know about them.

So there you have it. The scalaz.Free monad with its Suspend(F[Free[F,A]]) constructor and the IO monad with its Req(F[I], I => IO[F,A]) constructor are actually equivalent. The latter is simply making use of CoYoneda to say the same thing.

Why bother? The useful part is that CoYoneda[F,_] is a functor for any F, so it’s handy to use in a free monad since we can then drop the requirement that F is a functor. What’s more, it gives us map fusion for free, since map over CoYoneda is literally just function composition on its f component. Although this latter is, in the absence of tail call elimination, not as useful as it could be in Scala.

I hope that sheds a little bit of light on the Yoneda lemma as well as the different embeddings of free monads.

Free Monoids and Free Monads

In a series of old posts, I once talked about the link between lists and monoids, as well as monoids and monads. Now I want to talk a little bit more about monoids and monads from the perspective of free structures.

List is a free monoid. That is, for any given type A, List[A] is a monoid, with list concatenation as the operation and the empty list as the identity element.

1
2
3
4
def freeMonoid[A]: Monoid[List[A]] = new Monoid[List[A]] {
  def unit = Nil
  def op(a1: List[A], a2: List[A]) = a1 ++ a2
}

Being a free monoid means that it’s the minimal such structure. List[A] has exactly enough structure so that it is a monoid for any given A, and it has no further structure. This also means that for any given monoid B, there must exist a transformation, a monoid homomorphism from List[A] to B:

1
def foldMap[A,B:Monoid](as: List[A])(f: A => B): B

Given a mapping from A to a monoid B, we can collapse a value in the monoid List[A] to a value in B.

Free monads

Now, if you followed my old posts, you already know that monads are “higher-kinded monoids”. A monoid in a category where the objects are type constructors (functors, actually) and the arrows between them are natural transformations. As a reminder, a natural transformation from F to G can be represented this way in Scala:

1
2
3
trait ~>[F[_],G[_]] {
  def apply[A](a: F[A]): G[A]
}

And it turns out that there is a free monad for any given functor F:

1
2
3
sealed trait Free[F[_],A]
case class Return[F[_],A](a: A) extends Free[F,A]
case class Suspend[F[_],A](f: F[Free[F,A]]) extends Free[F,A]

Analogous to how a List[A] is either Nil (the empty list) or a product of a head element and tail list, a value of type Free[F,A] is either an A or a product of F[_] and Free[F,_]. It is a recursive structure. And indeed, it has exactly enough structure to be a monad, for any given F, and no more.

When I say “product” of two functors like F[_] and Free[F,_], I mean a product like this:

1
2
3
trait :*:[F[_],G[_]] {
  type λ[A] = F[G[A]]
}

So we might expect that there is a monad homomorphism from a free monad on F to any monad that F can be transformed to. And indeed, it turns out that there is. The free monad catamorphism is in fact a monad homomorphism. Given a natural transformation from F to G, we can collapse a Free[F,A] to G[A], just like with foldMap when given a function from A to B we could collapse a List[A] to B.

1
def runFree[F[_],G[_]:Monad,A](as: Free[F,A])(f: F ~> G): G[A]

But what’s the equivalent of foldRight for Free? Remember, foldRight takes a unit element z and a function that accumulates into B so that B doesn’t actually have to be a monoid. Here, f is a lot like the monoid operation, except it takes the current A on the left:

1
def foldRight[A,B](as: List[A])(z: B)(f: (A, B) => B): B

The equivalent for Free takes a natural transformation as its unit element, which for a monad happens to be monadic unit. Then it takes a natural transformation as its f argument, that looks a lot like monadic join, except it takes the current F on the left:

1
2
3
4
5
6
type Id[A] = A

def foldFree[F[_]:Functor,G[_],A](
  as: Free[F,A])(
  z: Id ~> G)(
  f: (F:*:G)#λ ~> G): G[A]

In this case, G does not have to be a monad at all.

Free as well as natural transformations and product types are available in Scalaz.

What Purity Is and Isn’t

A lot of discussion about “purity” goes on without participants necessarily having a clear idea of what it means exactly. Such discussion is generally unhelpful and distracting.

What purity is

The typical definition of purity (and the one we use in our book) goes something like this:

An expression e is referentially transparent if for all programs p, every occurrence of e in p can be replaced with the result of evaluating e without changing the result of evaluating p.

A function f is pure if the expression f(x) is referentially transparent for all referentially transparent x.

Now, something needs to be made clear right up front. Like all definitions, this holds in a specific context. In particular, the context needs to specify what “evaluating” means. It also needs to define “program”, “occurrence”, and the semantics of “replacing” one thing with another.

In a programming language like Haskell, Java, or Scala, this context is pretty well established. The process of evaluation is a reduction to some normal form such as weak head or beta normal form.

A simple example

To illustrate, let’s consider programs in an exceedingly simple language that we will call Sigma. An expression in Sigma has one of the following forms:

  • A literal character string like "a", "foo", "", etc.
  • A concatenation, s + t, for expressions s and t.
  • A special Ext expression that denotes input from an external source.

Now, without an evaluator for Sigma, it is a purely abstract algebra. So let’s define a straigtforward evaluator eval for it, with the following rules:

  • A literal string is already in normal form.
  • eval(s + t) first evaluates s and t and concatenates the results into one literal string.
  • eval(Ext) reads a line from standard input and returns it as a literal string.

This might seem very simple, but it is still not clear whether Ext is referentially transparent with regard to eval. It depends. What does “reads a line” mean, and what is “standard input” exactly? This is all part of a context that needs to be established.

Here’s one implementation of an evaluator for Sigma, in Scala:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
sealed trait Sigma
case class Lit(s: String) extends Sigma
case class Concat(s: Sigma, t: Sigma) extends Sigma
case object Ext extends Sigma

def eval1(sig: Sigma): Sigma = sig match {
  case Concat(s, t) => {
    val Lit(e1) = eval1(s)
    val Lit(e2) = eval1(t)
    Lit(e1 + e2)
  }
  case Ext => Lit(readLine)
  case x => x
}

Now, it’s easy to see that the Ext instruction is not referentially transparent with regard to eval1. Replacing Ext with eval1(ext) does not preserve meaning. Consider this:

1
2
val x = Ext
eval1(Concat(x, x))

VS this:

1
2
val x = eval1(Ext)
eval1(Concat(x, x))

That’s clearly not the same thing. The former will get two strings from standard input and concatenate them together. The latter will get only one string, store it as x, and return x + x.

Now consider a slightly different evaluator:

1
2
3
4
5
6
7
8
9
def eval2(sig: Sigma, stdin: String): Sigma = sig match {
  case Concat(s, t) => {
    val Lit(e1) = eval2(s, stdin)
    val Lit(e2) = eval2(t, stdin)
    Lit(e1 + e2)
  }
  case Ext => Lit(stdin)
  case x => x
}

In this case, the Ext instruction clearly is referentially transparent with regard to eval2, because our standard input is just a string, and it is always the same string. So you see, the purity of functions in the Sigma language very much depends on how that language is interpreted.

This is the reason why Haskell programs are considered “pure”, even in the presence of IO. A value of type IO a in Haskell is simply a function. Reducing it to normal form (evaluating it) has no effect. An IO action is of course not referentially transparent with regard to unsafePerformIO, but as long as your program does not use that it remains a referentially transparent expression.

What purity is not

In my experience there are more or less two camps into which unhelpful views on purity fall.

The first view, which we will call the empiricist view, is typically taken by people who understand “pure” as a pretentious term, meant to denegrate regular everyday programming as being somehow “impure” or “unclean”. They see purity as being “academic”, detached from reality, in an ivory tower, or the like.

This view is premised on a superficial understanding of purity. The assumption is that purity is somehow about the absence of I/O, or not mutating memory. But how could any programs be written that don’t change the state of memory? At the end of the day, you have to update the CPU’s registers, write to memory, and produce output on a display. A program has to make the computer do something, right? So aren’t we just pretending that our programs don’t run on real computers? Isn’t it all just an academic exercise in making the CPU warm?

Well, no. That’s not what purity means. Purity is not about the absence of program behaviors like I/O or mutable memory. It’s about delimiting such behavior in a specific way.

The other view, which I will call the rationalist view, is typically taken by people with overexposure to modern analytic philosophy. Expressions are to be understood by their denotation, not by reference to any evaluator. Then of course every expression is really referentially transparent, and so purity is a distinction without a difference. After all, an imperative side-effectful C program can have the same denotation as a monadic, side-effect-free Haskell program. There is nothing wrong with this viewpoint, but it’s not instructive in this context. Sure, when designing in the abstract, we can think denotationally without regard to evaluation. But when concretizing the design in terms of an actual programming language, we do need to be aware of how we expect evaluation to take place. And only then are referential transparency and purity useful concepts.

Both of these, the rationalist and empiricist views, conflate different levels of abstraction. A program written in a programming language is not the same thing as the physical machine that it may run on. Nor is it the same thing as the abstractions that capture its meaning.

Further reading

I highly recommend the paper What is a Purely Functional Language? by Amr Sabry, although it deals with the idea of a purely functional language rather than purity of functions within a language that does not meet that criteria.

New Beginnings

I have not seriously updated the old Apocalisp blog for quite some time. Mostly this is due to the fact that I have been spending all of my creative time outside of work on writing a book. It’s also partly that putting a post up on WordPress is a chore. It’s like building a ship in a bottle.

So I have decided to make posting really easy for myself by hosting the blog on GitHub. I am using a dead-simple markdown-based framework called Octopress. With this setup I can very easily write a new post from my command line and publish by pushing to GitHub. This is already part of my normal coding workflow, so it feels more friction-free.

The new blog is simply titled “Higher Order”, and is available at blog.higher-order.com. Check back soon for posts that I’ve been sitting on but have been too lazy busy to post.

All of the old content and comments will still be available at the old address, and I’ll probably cross-post to both places for a little while.