If all monads are given by composing adjoint pairs of functors, what adjoint pair of functors forms the `Reader` monad? And if we compose those functors the other way, which comonad do we get?
Following leads from Shachaf Ben-Kiki on IRC, I thought about whether there was a free-forgetful adjunction given by an isomorphism between Kleisli arrows in the Reader monad on one hand and its algebras on the other. But then what are (R => ?) algebras, other than values of type R?
Searching around for an answer to that, I came across this Mathematics Stack Exchange question, whose answer more or less directly addresses my first question above. The adjunction is a little more difficult to see than I expected, but I’ve worked through it, and what follows is an outline of how I convinced myself that it works.
The adjoint pair
Let’s consider the reader monad R => ?, which allows us to read a context of type R.
The first category involved is Set (or Hask, or Scala). This is just the familiar category where the objects are types (A,B,C, etc.) and the arrows are functions.
The other category is Set/R, which is the slice category of Set over the type R. This is a category whose objects are functions to R. So an object x in this category is given by a type A together with a function of type A => R. An arrow from x: A => R to y: B => R is given by a function f: A => B such that y(f(a)) = x(a) for all a:A.
The left adjoint is R*, a functor from Set to Set/R. This functor sends each type A to the function (p:(R,A)) => p._1, having type (R,A) => R.
1
defrStar[A]:(R,A)=>R=_._1
The right adjoint is Π_R, a functor from Set/R to Set. This functor sends each object q: A => R in Set/R to the set of functions R => A for which q is an inverse. This is actually a dependent type inhabited by functions p: R => A which satisfy the identity q(p(a)) = a for all a:A.
Constructing the monad
The monad is not exactly easy to see, but if everything has gone right, we should get the R => ? reader monad by composing Π_R with R*.
We start with a type A. Then we do R*, which gives us the object rStar[A] in the slice category, which you will recall is just _._1 of type (R,A) => R. Then we go back to types via Π_R(rStar[A]) which gives us a dependent type P inhabited by functions p: R => (R,A). Now, this looks a lot like an action in the State monad. But it’s not. These p must satisfy the property that _1 is their inverse. Which means that the R they return must be exactly the R they were given. So it’s like a State action that is read only. We can therefore simplify this to the ordinary (non-dependent) type R => A. And now we have our Reader monad.
Constructing the comonad
But what about the other way around? What is the comonad constructed by composing R* with Π_R? Well, since we end up in the slice category, our comonad is actually in that category rather than in Set.
We start with an object q: A => R in the slice category. Then we go to types by doing Π_R(q). This gives us a dependent type P_A which is inhabited by all p: R => A such that q is their inverse. Then we take rStar[Π_R(q)] to go back to the slice category and we find ourselves at an object f: (R, Π_R(q)) => R, which you’ll recall is implemented as _._1. As an endofunctor in Set/R, λq. rStar[Π_R(q)] takes all q: A => R to p: (R, R => A) => R = _._1 such that p is only defined on R => A arguments whose inverse is q.
That is, the counit for this comonad on elements y: A => R must be a function counit: (R, Π_R(y)) => A such that for _._1: (R, Π_R(y)) => R, the property y compose counit = _._1 holds. Note that this means that the R returned by _._1 and the R returned by y must be the same. Recall that _._1 always returns the first element of its argument, and also recall that the functions in Π_R(y) must have y as their inverse, so they’re only defined at the first element of the argument to _._1. That is p._2(x) is only defined when x = p._1.
If we try to encode that in Scala (ignoring all the “such that”), we get something like:
12
defcounit[A](p:(R,R=>A)):A=p._2(p._1)
This looks a lot like a counit for the Store comonad! Except what we constructed is not that. Because of the additional requirements imposed by our functors and by the slice category, the second element of p can only take an argument that is exactly the first element of p. So we can simplify that to (R, () => A) or just (R, A). And we now have the familiar Coreader comonad.
But how?
Now, finding the answer on StackExchange seems a litte bit of a deus ex machina. If you know of a way to factor a monad into an adjoint pair of functors (or if you know that this can’t be done in general), please comment below.
In chapter 11 of our book, we talk about monads in Scala. This finally names a pattern that the reader has seen throughout the book and gives it a formal structure. We also give some intuition for what it means for something to be a monad. Once you have this concept, you start recognizing it everywhere in the daily business of programming.
Today I want to talk about comonads, which are the dual of monads. The utility of comonads in everyday life is not quite as immediately obvious as that of monads, but they definitely come in handy sometimes. Particularly in applications like image processing and scientific computation.
A monad, upside-down
Let’s remind ourselves of what a monad is. A monad is a functor, which just means it has a map method:
Recall that join has to satisfy associativity, and unit has to be an identity for join.
In Scala a monad is often stated in terms of flatMap, which is map followed by join. But I find this formulation easier to explain.
Every monad has the above operations, the so-called proper morphisms of a monad, and may also bring to the table some nonproper morphisms which give the specific monad some additional capabilities.
Reader monad
For example, the Reader monad brings the ability to ask for a value:
The meaning of join in the writer monad is to concatenate the “log” of written values using the monoid for W (this is using the Monoid class from Scalaz):
The meaning of join in the state monad is to give the outer action an opportunity to get and put the state, then do the same for the inner action, making sure any subsequent actions see the changes made by previous ones.
Note that counit is pronounced “co-unit”, not “cow-knit”. It’s also sometimes called extract because it allows you to get a value of type Aout of a W[A]. While with monads you can generally only put values in and not get them out, with comonads you can generally only get them out and not put them in.
And instead of being able to join two levels of a monad into one, we can duplicate one level of a comonad into two.
Kind of weird, right? This also has to obey some laws. We’ll get to those later on, but let’s first look at some actual comonads.
The identity comonad
A simple and obvious comonad is the dumb wrapper (the identity comonad):
This one is also the identity monad. Id doesn’t have any functionality other than the proper morphisms of the (co)monad and is therefore not terribly interesting. We can get the value out with our counit, and we can vacuously duplicate by decorating our existing Id with another layer.
The reader comonad
There’s a comonad with the same capabilities as the reader monad, namely that it allows us to ask for a value:
It should be obvious how we can give a Comonad instance for this (I’m using the Kind Projector compiler plugin to make the syntax look a little nicer than Vanilla Scala):
Arguably, this is much more straightforward in Scala than the reader monad. In the reader monad, the ask function is the identity function. That’s saying “once the R value is available, return it to me”, making it available to subsequent map and flatMap operations. But in Coreader, we don’t have to pretend to have an R value. It’s just right there and we can look at it.
So Coreader just wraps up some value of type A together with some additional context of type R. Why is it important that this is a comonad? What is the meaning of duplicate here?
To see the meaning of duplicate, notice that it puts the whole Coreader in the value slot (in the extract portion). So any subsequent extract or map operation will be able to observe both the value of type A and the context of type R. We can think of this as passing the context along to those subsequent operations, which is analogous to what the reader monad does.
In fact, just like map followed by join is usually expressed as flatMap, by the same token duplicate followed by map is usually expressed as a single operation, extend:
Notice that the type signature of extend looks like flatMap with the direction of f reversed. And just like we can chain operations in a monad using flatMap, we can chain operations in a comonad using extend. In Coreader, extend is making sure that f can use the context of type R to produce its B.
Chaining operations this way using flatMap in a monad is sometimes called Kleisli composition, and chaining operations using extend in a comonad is called coKleisli composition (or just Kleisli composition in a comonad).
The writer comonad
Just like the writer monad, the writer comonad can append to a log or running tally using a monoid. But instead of keeping the log always available to be appended to, it uses the same trick as the reader monad by building up an operation that gets executed once a log becomes available:
Note that duplicate returns a whole Cowriter from its constructed run function, so the meaning is that subsequent operations (composed via map or extend) have access to exactly one tell function, which appends to the existing log or tally.
It can be hard to get an intuition for what these laws mean, but in short they mean that (co)Kleisli composition in a comonad should be associative and that extract (a.k.a. counit) should be an identity for it.
Intuitively, both the monad and comonad laws mean that we should be able to write our programs top-down or bottom-up, or any combination thereof, and have that mean the same thing regardless.
Next time…
In part 2 we’ll look at some more examples of comonads and follow some of the deeper connections. Like what’s the relationship between the reader monad and the reader comonad, or the writer monad and the writer comonad? They’re not identical, but they seem to do all the same things. Are they equivalent? Isomorphic? Something else?
I’ve found that if I’m using scala.concurrent.Future in my code, I can get some really easy performance gains by just switching to scalaz.concurrent.Task instead, particularly if I’m chaining them with map or flatMap calls, or with for comprehensions.
Jumping into thread pools
Every Future is basically some work that needs to be submitted to a thread pool. When you call futureA.flatMap(a => futureB), both Future[A] and Future[B] need to be submitted to the thread pool, even though they are not running concurrently and could theoretically run on the same thread. This context switching takes a bit of time.
Jumping on trampolines
With scalaz.concurrent.Task you have a bit more control over when you submit work to a thread pool and when you actually want to continue on the thread that is already executing a Task. When you say taskA.flatMap(a => taskB), the taskB will by default just continue running on the same thread that was already executing taskA. If you explicitly want to dip into the thread pool, you have to say so with Task.fork.
This works since a Task is not a concurrently running computation. It’s a description of a computation—a sequential list of instructions that may include instructions to submit some of the work to thread pools. The work is actually executed by a tight loop in Task’s run method. This loop is called a trampoline since every step in the Task (that is, every subtask) returns control to this loop.
Jumping on a trampoline is a lot faster than jumping into a thread pool, so whenever we’re composing Futures with map and flatMap, we can just switch to Task and make our code faster.
Making fewer jumps
But sometimes we know that we want to continue on the same thread and we don’t want to spend the time jumping on a trampoline at every step. To demonstrate this, I’ll use the Ackermann function. This is not necessarily a good use case for Future but it shows the difference well.
123456
// Only defined for positive `m` and `n`defackermann(m:Int,n:Int):Int=(m,n)match{case(0,_)=>n+1case(m,0)=>ackermann(m-1,1)case(m,n)=>ackermann(m-1,ackermann(m,n-1))}
This function is supposed to terminate for all positive m and n, but if they are modestly large, this recursive definition overflows the stack. We could use futures to alleviate this, jumping into a thread pool instead of making a stack frame at each step:
Since there’s no actual concurrency going on here, we can make this instantly faster by switching to Task instead, using a trampoline instead of a thread pool:
But even here, we’re making too many jumps back to the trampoline with suspend. We don’t actually need to suspend and return control to the trampoline at each step. We only need to do it enough times to avoid overflowing the stack. Let’s say we know how large our stack can grow:
1
valmaxStack=512
We can then keep track of how many recursive calls we’ve made, and jump on the trampoline only when we need to:
I did some comparisons using Caliper and made this pretty graph for you:
The horizontal axis is the number of steps, and the vertical axis is the mean time that number of steps took over a few thousand runs.
This graph shows that Task is slightly faster than Future for submitting to thread pools (blue and yellow lines marked Future and Task respectively) only for very small tasks; up to about when you get to 50 steps, when (on my Macbook) both futures and tasks cross the 30 μs threshold. This difference is probably due to the fact that a Future is a running computation while a Task is partially constructed up front and explicitly run later. So with the Future the threads might just be waiting for more work. The overhead of Task.run seems to catch up with us at around 50 steps.
But honestly the difference between these two lines is not something I would care about in a real application, because if we jump on the trampoline instead of submitting to a thread pool (green line marked Trampoline), things are between one and two orders of magnitude faster.
If we only jump on the trampoline when we really need it (red line marked Optimized), we can gain another order of magnitude. Compared to the original naïve version that always goes to the thread pool, this is now the difference between running your program on a 10 MHz machine and running it on a 1 GHz machine.
If we measure without using any Task/Future at all, the line tracks the Optimized red line pretty closely then shoots to infinity around 1000 (or however many frames fit in your stack space) because the program crashes at that point.
In summary, if we’re smart about trampolines vs thread pools, Future vs Task, and optimize for our stack size, we can go from milliseconds to microseconds with not very much effort. Or seconds to milliseconds, or weeks to hours, as the case may be.
After giving it a lot of thought I have come to the conclusion that I won’t be involved in “Functional Programming in Java”. There are many reasons, including that I just don’t think I can spend the time to make this a good book. Looking at all the things I have scheduled for the rest of the year, I can’t find the time to work on it.
More depressingly, the thought of spending a year or more writing another book makes me anxious. I know from experience that making a book (at least a good one) is really hard and takes up a lot of mental energy. Maybe one day there will be a book that I will want to forego a year of evenings and weekends for, but today is not that day.
Originally, the content of FPiJ was going to be based on “Functional Programming in Scala”, but after some discussion with the publisher I think we were all beginning to see that this book deserved its own original content specifically on an FP style in Java.
I really do think such a thing deserves its own original book. Since Java is strictly less suitable for functional programming than Scala is, a book on FP in Java will have to lay a lot of groundwork that we didn’t have to do with FPiS, and it will have to forego a lot of the more advanced topics.
I wish the author of that book, and the publisher, all the best and I hope they do well. I’m sorry to let you all down, but I’m sure this is for the best.
Naturally, readers get the most out of this book by downloading the source code from GitHub and doing the exercises as they read. But a number of readers have made the comment that they wish they could have the hints and answers with them when they read the book on the train to and from work, on a long flight, or wherever there is no internet connection or it’s not convenient to use a computer.
It is of course entirely possible to print out the chapter notes, hints, and exercises, and take them with you either as a hardcopy or as a PDF to use on a phone or tablet. Well, I’ve taken the liberty of doing that work for you. I wrote a little script to concatenate all the chapter notes, errata, hints, and answers into Markdown files and then just printed them all to a single document, tweaking a few things here and there. I’m calling this A companion booklet to “Functional Programming in Scala”. It is released under the same MIT license as the content it aggregates. This means you’re free to copy it, distribute or sell it, or basically do whatever you want with it. The Markdown source of the manuscript is available on my GitHub.
I have made an electronic version of this booklet available on Leanpub as as a PDF, ePub, and Kindle file on a pay-what-you-want basis (minimum of $0.99). It has full color syntax highlighting throughout and a few little tweaks to make it format nicely. The paper size is standard US Letter which makes it easy to print on most color printers. If you choose to buy the booklet from Leanpub, they get a small fee, a small portion of the proceeds goes to support Liberty in North Korea, and the rest goes to yours truly. You’ll also get updates when those inevitably happen.
The booklet is also available from CreateSpace or Amazon as a full color printed paperback. This comes in a nicely bound glossy cover for just a little more than the price of printing (they print it on demand for you). I’ve ordered one and I’m really happy with the quality of this print:
The print version is of course under the same permissive license, so you can make copies of it, make derivative works, or do whatever you want. It’s important to note that with this booklet I’ve not done anything other than design a little cover and then literally print out this freely available content and upload it to Amazon, which anybody could have done (and you still can if you want).
I hope this makes Functional Programming in Scala more useful and more enjoyable for more people.
Like a lot of people, I keep a list of books I want to read. And because there are a great many more books that interest me than I can possibly read in my lifetime, this list has become quite long.
In the olden days of brick-and-mortar bookstores and libraries, I would discover books to read by browsing shelves and picking up what looked interesting at the time. I might even find something that I knew was on my list. “Oh, I’ve been meaning to read that!”
The Internet changes this dynamic dramatically. It makes it much easier for me to discover books that interest me, and also to access any book that I might want to read, instantly, anywhere. At any given time, I have a couple of books that I’m “currently reading”, and when I finish one I can start another immediately. I use Goodreads to manage my to-read list, and it’s easy for me to scroll through the list and pick out my next book.
But again, this list is very long. So I wanted a good way to filter out books I will really never read, and sort it such that the most “important” books in some sense show up first. Then every time I need a new book I could take the first one from the list and make a binary decision: either “I will read this right now”, or “I am never reading this”. In the latter case, if a book interests me enough at a later time, I’m sure it will find its way back onto my list.
The problem then is to find a good metric by which to rank books. Goodreads lets users rank books with a star-rating from 1 to 5, and presents you with an average rating by which you can sort the list. The problem is that a lot of books that interest me have only one rating and it’s 5 stars, giving the book an “average” of 5.0. So if I go with that method I will be perpetually reading obscure books that one other person has read and loved. This is not necessarily a bad thing, but I do want to branch out a bit.
Another possibility is to use the number of ratings to calculate a confidence interval for the average rating. For example, using the Wilson score I could find an upper and lower bound s1 and s2 (higher and lower than the average rating, respectively) that will let me say “I am 95% sure that any random sample of readers of an equal size would give an average rating between s1 and s2.” I could then sort the list by the lower bound s1.
But this method is dissatisfactory for a number of reasons. First, it’s not clear how to fit star ratings to such a measure. If we do the naive thing and count a 1-star rating as 1/5 and a 5 star rating as 5/5, that counts a 1-star rating as a “partial success” in some sense. We could discard 1-stars as 0, and count 2, 3, 4, and 5 stars as 25%, 50%, 75%, and 100%, respectively.
But even if we did make it fit somehow, it turns out that if you take any moderately popular book on Goodreads at random, it will have an average rating somewhere close to 4. I could manufacture a prior based on this knowledge and use that instead of the normal distribution or the Jeffreys prior in the confidence interval, but that would still not be a very good ranking because reader review metascores are meaningless.
In the article “Reader review metascores are meaningless”, Stephanie Shun suggests using the percentage of 5-star ratings as the relevant metric rather than the average rating. This is a good suggestion, since even a single 5-star rating carries a lot of actionable information whereas an average rating close to 4.0 carries very little.
I can then use the Wilson score directly, counting a 5-star rating as a successful trial and any other rating as a failed one. I can then just use the normal distribution instead of working with an artisanally curated prior.
Mathematica makes it easy to generate the Wilson score. Here, pos is the number of positive trials (number of 5-star ratings), n is the number of total ratings, and confidence is the desired confidence percentage. I’m taking the lower bound of the confidence interval to get my score.
Now I just need to get the book data from Goodreads. Fortunately, it has a pretty rich API. I just need a developer key, which anyone can get for free.
For example, to get the ratings for a given book id, we can use their XML api for books and pattern match on the result to get the ratings by score:
Here, key is my Goodreads developer API key, defined elsewhere. I put a Pause[1] in the call since Goodreads throttles API calls so you can’t make more than one call per second to each API endpoint. I’m also memoizing the result, by assigning to Ratings[id] in the global environment.
Ratings will give us an association list with the number of ratings for each score from 1 to 5, together with the total. For example, for the first book in their catalogue, Harry Potter and the Half-Blood Prince, here are the scores:
So Wilson is 95% confident that in any random sample of about 1.2 million Harry Potter readers, at least 61.572% of them would give The Half-Blood Prince a 5-star rating. That turns out to be a pretty high score, so if this book were on my list (which it isn’t), it would feature pretty close to the very top.
But now the score for a relatively obscure title is too low. For example, the lower bound of the 95% confidence interval for a single-rating 5-star book will be 0.206549, which will be towards the bottom of any list. This means I would never get to any of the obscure books on my reading list, since they would be edged out by moderately popular books with an average rating close to 4.0.
See, if I’ve picked a book that I want to read, I’d consider five ratings that are all five stars a much stronger signal than the fact that people who like Harry Potter enough to read 5 previous books loved the 6th one. Currently the 5*5 book will score 57%, a bit weaker than the Potter book’s 62%.
I can fix this by lowering the confidence level. Because honestly, I don’t need a high confidence in the ranking. I’d rather err on the side of picking up a deservedly obscure book than to miss out on a rare gem. Experimenting with this a bit, I find that a confidence around 80% raises the obscure books enough to give me an interesting mix. For example, a 5*5 book gets a 75% rank, while the Harry Potter one stays at 62%.
I’m going to call that the Rúnar rank of a given book. The Rúnar rank is defined as the lower bound of the 1-1/q Wilson confidence interval for scoring in the qth q-quantile. In the special case of Goodreads ratings, it’s the 80% confidence for a 5-star rating.
Unfortunately, there’s no way to get the rank of all the books in my reading list in one call to the Goodreads API. And when I asked them about it they basically said “you can’t do that”, so I’m assuming that feature will not be added any time soon. So I’ll have to get the reading list first, then call RunarRank for each book’s id. In Goodreads, books are managed by “shelves”, and the API allows getting the contents of a given shelf, 200 books at a time:
I’m doing a bunch of XML pattern matching here to get the id, title, average_rating, and first author of each book. Then I put that in an association list. I’m getting only the top-200 books on the list by average rating (which currently is about half my list).
With that in hand, I can get the contents of my “to-read” shelf with GetShelf[runar, "to-read"], where runar is my Goodreads user id. And given that, I can call RunarRank on each book on the shelf, then sort the result by that rank:
Now I can get, say, the first 10 books on my improved reading list:
1
Gridify[ReadingList[runar][[1;;10]]]
9934419
Kvæðasafn
75.2743%
Snorri Hjartarson
5.00
17278
The Feynman Lectures on Physics Vol 1
67.2231%
Richard P. Feynman
4.58
640909
The Knowing Animal: A Philosophical Inquiry Into Knowledge and Truth
64.6221%
Raymond Tallis
5.00
640913
The Hand: A Philosophical Inquiry Into Human Being
64.6221%
Raymond Tallis
5.00
4050770
Volition As Cognitive Self Regulation
62.231%
Harry Binswanger
4.86
8664353
Unbroken: A World War II Story of Survival, Resilience, and Redemption
60.9849%
Laura Hillenbrand
4.45
13413455
Software Foundations
60.1596%
Benjamin C. Pierce
4.80
77523
Harry Potter and the Sorcerer’s Stone (Harry Potter #1)
59.1459%
J.K. Rowling
4.39
13539024
Free Market Revolution: How Ayn Rand’s Ideas Can End Big Government
59.1102%
Yaron Brook
4.48
1609224
The Law
58.767%
Frédéric Bastiat
4.40
I’m quite happy with that. Some very popular and well-loved books interspersed with obscure ones with exclusively (or almost exclusively) positive reviews. The most satisfying thing is that the rating carries a real meaning. It’s basically the relative likelihood that I will enjoy the book enough to rate it five stars.
I can test this ranking against books I’ve already read. Here’s the top of my “read” shelf, according to their Rúnar Rank:
17930467
The Fourth Phase of Water
68.0406%
Gerald H. Pollack
4.85
7687279
Nothing Less Than Victory: Decisive Wars and the Lessons of History
64.9297%
John David Lewis
4.67
43713
Structure and Interpretation of Computer Programs
62.0211%
Harold Abelson
4.47
7543507
Capitalism Unbound: The Incontestable Moral Case for Individual Rights
57.6085%
Andrew Bernstein
4.67
13542387
The DIM Hypothesis: Why the Lights of the West Are Going Out
55.3296%
Leonard Peikoff
4.37
5932
Twenty Love Poems and a Song of Despair
54.7205%
Pablo Neruda
4.36
18007564
The Martian
53.9136%
Andy Weir
4.36
24113
Gödel, Escher, Bach: An Eternal Golden Braid
53.5588%
Douglas R. Hofstadter
4.29
19312
The Brothers Lionheart
53.0952%
Astrid Lindgren
4.33
13541678
Functional Programming in Scala
52.6902%
Rúnar Bjarnason
4.54
That’s perfect. Those are definitely books I thouroughly enjoyed and would heartily recommend. Especially that last one.
It’s well known that there is a trade-off in language and systems design between expressiveness and analyzability. That is, the more expressive a language or system is, the less we can reason about it, and vice versa. The more capable the system, the less comprehensible it is.
This principle is very widely applicable, and it’s a useful thing to keep in mind when designing languages and libraries. A practical implication of being aware of this principle is that we always make components exactly as expressive as necessary, but no more. This maximizes the ability of any downstream systems to reason about our components. And dually, for things that we receive or consume, we should require exactly as much analytic power as necessary, and no more. That maximizes the expressive freedom of the upstream components.
I find myself thinking about this principle a lot lately, and seeing it more or less everywhere I look. So I’m seeking a more general statement of it, if such a thing is possible. It seems that more generally than issues of expressivity/analyzability, a restriction at one semantic level translates to freedom and power at another semantic level.
What I want to do here is give a whole bunch of examples. Then we’ll see if we can come up with an integration for them all. This is all written as an exercise in thinking out loud and is not to be taken very seriously.
Examples from computer science
Context-free and regular grammars
In formal language theory, context-free grammars are more expressive than regular grammars. The former can describe strictly more sets of strings than the latter. On the other hand, it’s harder to reason about context-free grammars than regular ones. For example, we can decide whether two regular expressions are equal (they describe the same set of strings), but this is undecidable in general for context-free grammars.
Monads and applicative functors
If we know that an applicative functor is a monad, we gain some expressive power that we don’t get with just an applicative functor. Namely, a monad is an applicative functor with an additional capability: monadic join (or “bind”, or “flatMap”). That is, context-sensitivity, or the ability to bind variables in monadic expressions.
This power comes at a cost. Whereas we can always compose any two applicatives to form a composite applicative, two monads do not in general compose to form a monad. It may be the case that a given monad composes with any other monad, but we need some additional information about it in order to be able to conclude that it does.
Actors and futures
Futures have an algebraic theory, so we can reason about them algebraically. Namely, they form an applicative functor which means that two futures x and y make a composite future that does x and y in parallel. They also compose sequentially since they form a monad.
Actors on the other hand have no algebraic theory and afford no algebraic reasoning of this sort. They are “fire and forget”, so they could potentially do anything at all. This means that actor systems can do strictly more things in more ways than systems composed of futures, but our ability to reason about such systems is drastically diminished.
Typed and untyped programming
When we have an untyped function, it could receive any type of argument and produce any type of output. The implementation is totally unrestricted, so that gives us a great deal of expressive freedom. Such a function can potentially participate in a lot of different expressions that use the function in different ways.
A function of type Bool -> Bool however is highly restricted. Its argument can only be one of two things, and the result can only be one of two things as well. So there are 4 different implementations such a function could possibly have. Therefore this restriction gives us a great deal of analyzability.
For example, since the argument is of type Bool and not Any, the implementation mostly writes itself. We need to consider only two possibilities. Bool (a type of size 2) is fundamentally easier to reason about than Any (a type of potentially infinite size). Similarly, any usage of the function is easy to reason about. A caller can be sure not to call it with arguments other than True or False, and enlist the help of a type system to guarantee that expressions involving the function are meaningful.
Total functional programming
Programming in non-total languages affords us the power of general recursion and “fast and loose reasoning” where we can transition between valid states through potentially invalid ones. The cost is, of course, the halting problem. But more than that, we can no longer be certain that our programs are meaningful, and we lose some algebraic reasoning. For example, consider the following:
1
map (- n) (map (+ n) xs)) == xs
This states that adding n to every number in a list and then subtracting n again should be the identity. But what if n actually throws an exception or never halts? In a non-total language, we need some additional information. Namely, we need to know that n is total.
Referential transparency and side effects
The example above also serves to illustrate the trade-off between purely functional and impure programming. If n could have arbitrary side effects, algebraic reasoning of this sort involving n is totally annihilated. But if we know that n is referentially transparent, algebraic reasoning is preserved. The power of side effects comes at the cost of algebraic reasoning. This price includes loss of compositionality, modularity, parallelizability, and parametricity. Our programs can do strictly more things, but we can conclude strictly fewer things about our programs.
Example from infosec
There is a principle in computer security called The Principle of Least Privilege. It says that a user or program should have exactly as much authority as necessary but no more. This constrains the power of the entity, but greatly enhances the power of others to predict and reason about what the entity is going to do, resulting in the following benefits:
Compositionality – The fewer privileges a component requires, the easier it is to deploy inside a larger environment. For the purposes of safety, higher privileges are a barrier to composition since a composite system requires the highest privileges of any of its components.
Modularity – A component with restricted privileges is easier to reason about in the sense that its interaction with other components will be limited. We can reason mechanically about where this limit actually is, which gives us better guarantees about the the security and stability of the overall system. A restricted component is also easier to test in isolation, since it can be run inside an overall restricted environment.
Example from politics
Some might notice an analogy between the Principle of Least Privilege and the idea of a constitutionally limited government. An absolute dictatorship or pure democracy will have absolute power to enact whatever whim strikes the ruler or majority at the moment. But the overall stability, security, and freedom of the people is greatly enhanced by the presence of legal limits on the power of the government. A limited constitutional republic also makes for a better neighbor to other states.
More generally, a ban on the initiation of physical force by one citizen against another, or by the government against citizens, or against other states, makes for a peaceful and prosperous society. The “cost” of such a system is the inability of one person (or even a great number of people) to impose their preferences on others by force.
An example from mathematics
The framework of two-dimensional Euclidean geometry is simply an empty page on which we can construct lines and curves using tools like a compass and straightedge. When we go from that framework to a Cartesian one, we constrain ourselves to reasoning on a grid of pairs of numbers. This is a tradeoff between expressivity and analyzability. When we move fom Euclidean to Cartesian geometry, we lose the ability to assume isotropy of space, intersection of curves, and compatibility between dimensions. But we gain much more powerful things through the restriction: the ability to precisely define geometric objects, to do arithmetic with them, to generalize to higher dimensions, and to reason with higher abstractions like linear algebra and category theory.
Examples from everyday life
Driving on roads
Roads constrain the routes we can take when we drive or walk. We give up moving in a straight line to wherever we want to go. But the benefit is huge. Roads let us get to where we’re going much faster and more safely than we would otherwise.
Commodity components
Let’s say you make a decision to have only one kind of outfit that you wear on a daily basis. You just go out and buy multiple identical outfits. Whereas you have lost the ability to express yourself by the things you wear, you have gained a certain ability to reason about your clothing. The system is also fault-tolerant and compositional!
Summary
What is this principle? Here are some ways of saying it:
Things that are maximally general for first-order applications are minimally useful for higher-order applications, and vice versa.
A language that is maximally expressive is minimally analyzable.
A simplifying assumption at one semantic level paves the way to a richer structure at a higher semantic level.
What do you think? Can you think of a way to integrate these examples into a general principle? Do you have other favorite examples of this principle in action? Is this something everyone already knows about and I’m just late to the party?
Today I disabled my Twitter account. Probably only temporarily, but we’ll see. This is just a quick note to let everyone know that I’m not leaving Twitter because of anything specific. It’s not that you said or did anything wrong. I just find that it’s becoming an energy sink for me. I’m putting Twitter away as a measure to control my focus, and to better control what information I consume and produce.
Hopefully this means that I will post more here when I have something interesting to share. If you need to reach me, I’m available by email. My address is runar at this blog’s domain.
(UPDATE 2014-12-21): I’ve recreated my Twitter account, but I’m now using the service in a different way. The biggest change is that I don’t follow anyone. It’s strictly a broadcasting device for me, and not an information consumption device.
In early September 2014, we published Functional Programming in Scala. It is now available from all major booksellers, and from the publisher at manning.com/bjarnason. It’s available as a beautiful paper book, on Kindle and other e-book readers, and as a PDF file.
I just want to share my personal story of how this book came to exist. A much shorter version of this story became the preface for the finished book, but here is the long version.
Around 2006 I was working in Austin and coming up on my 8th anniversary as an enterprise Java programmer. I had started to notice that I was making a lot of the same mistakes over and over again. I had a copy of the Gang of Four’s Design Patterns on my desk that I referred to frequently, and I built what I thought were elegant object-oriented designs. Every new project started out well enough, but became a big ball of mud after a while. My once-elegant class hierarchies gathered bugs, technical debt, and unimplemented features. Making changes started to feel like trudging through a swamp. I was never confident that I wasn’t introducing defects as I went. My code was difficult to test or reuse, and impossible to reason about. My productivity plummeted, and a complete rewrite became inevitable. It was a vicious cycle.
In looking for a more disciplined approach, I came across Haskell and functional programming. Here was a community of people with a sound methodology for reasoning about their programs. In other words, they actually knew what they were doing. I found a lot of good ideas and proceeded to import them to Java. A little later I met Tony Morris, who had been doing the same, on IRC. He told me about this new JVM language, Scala. Tony had a library called Scalaz (scala-zed) that made FP in Scala more pleasant, and I started contributing to that library. One of the other people contributing to Scalaz was Paul Chiusano, who was working for a company in Boston. In 2008 he invited me to come work with him, doing Scala full time. I sold my house and everything in it, and moved to Boston.
Paul co-organized the Boston Area Scala Enthusiasts, a group that met monthly at Google’s office in Cambridge. It was a popular group, mainly among Java programmers who were looking for something better. But there was a clear gap between those who had come to Scala from an FP perspective and those who saw Scala as just a better way to write Java. In April 2010 another of the co-organizers, Nermin Serifovic, said he thought there was “tremendous demand” for a book that would bridge that gap, on the topic of functional programming in Scala. He suggested that Paul and I write that book. We had a very clear idea of the kind of book we wanted to write, and we thought it would be quick and easy. More than four years later, I think we have made a good book.
Paul and I hope to convey in this book some of the excitement that we felt when we were first discovering FP. It’s encouraging and empowering to finally feel like we’re writing comprehensible software that we can reuse and confidently build upon. We want to invite you to the world of programming as it could be and ought to be.
Today I want to talk about relationships between monoids. These can be useful to think about when we’re developing libraries involving monoids, and we want to express some algebraic laws among them. We can then check these with automated tests, or indeed prove them with algebraic reasoning.
This post kind of fell together when writing notes on chapter 10, “Monoids”, of Functional Programming in Scala. I am putting it here so I can reference it from the chapter notes at the end of the book.
Monoid homomorphisms
Let’s take the String concatenation and Int addition as example monoids that have a relationship. Note that if we take the length of two strings and add them up, this is the same as concatenating those two strings and taking the length of the combined string:
1
(a.length+b.length)==(a+b).length
So every String maps to a corresponding Int (its length), and every concatenation of strings maps to the addition of corresponding lengths.
The length function maps from String to Intwhile preserving the monoid structure. Such a function, that maps from one monoid to another in such a preserving way, is called a monoid homomorphism. In general, for monoids M and N, a homomorphism f: M => N, and all values x:M, y:M, the following law holds:
1
f(x|+|y)==(f(x)|+|f(y))
The |+| syntax is from Scalaz and is obtained by importing scalaz.syntax.monoid._. It just references the append method on the Monoid[T] instance, where T is the type of the arguments.
This law can have real practical benefits. Imagine for example a “result set” monoid that tracks the locations of a particular set of records in a database or file. This could be as simple as a Set of locations. Concatenating several thousand files and then proceeding to search through them is going to be much slower than searching through the files individually and then concatenating the result sets. Particularly since we can potentially search the files in parallel. A good automated test for our result set monoid would be that it admits a homomorphism from the data file monoid.
Monoid isomorphisms
Sometimes there will be a homomorphism in both directions between two monoids. If these are inverses of one another, then this kind of relationship is called a monoid isomorphism and we say that the two monoids are isomorphic. More precisely, we will have two monoids A and B, and homomorphisms f: A => B and g: B => A. If f(g(b)) == b and g(f(a)) == a, for all a:A and b:B then f and g form an isomorphism.
For example, the String and List[Char] monoids with concatenation are isomorphic. We can convert a String to a List[Char], preserving the monoid structure, and go back again to the exact same String we started with. This is also true in the inverse direction, so the isomorphism holds.
Other examples include (Boolean, &&) and (Boolean, ||) which are isomorphic via not.
Note that there are monoids with homomorphisms in both directions between them that nevertheless are not isomorphic. For example, (Int, *) and (Int, +). These are homomorphic to one another, but not isomorphic (thanks, Robbie Gates).
Monoid products and coproducts
If A and B are monoids, then (A,B) is certainly a monoid, called their product:
But is there such a thing as a monoid coproduct? Could we just use Either[A,B] for monoids A and B? What would be the zero of such a monoid? And what would be the value of Left(a) |+| Right(b)? We could certainly choose an arbitrary rule, and we may even be able to satisfy the monoid laws, but would that mean we have a monoid coproduct?
To answer this, we need to know the precise meaning of product and coproduct. These come straight from Wikipedia, with a little help from Cale Gibbard.
A productM of two monoids A and B is a monoid such that there exist homomorphisms fst: M => A, snd: M => B, and for any monoid Z and morphisms f: Z => A and g: Z => B there has to be a unique homomorphism h: Z => M such that fst(h(z)) == f(z) and snd(h(z)) == g(z) for all z:Z. In other words, the following diagram must commute:
A coproductW of two monoids A and B is the same except the arrows are reversed. It’s a monoid such that there exist homomorphisms left: A => W, right: B => W, and for any monoid Z and morphisms f: A => Z and g: B => Z there has to be a unique homomorphism h: W => Z such that h(left(a)) == f(a) and h(right(b)) == g(b) for all a:A and all b:B. In other words, the following diagram must commute:
We can easily show that our productMonoid above really is a monoid product. The homomorphisms are the methods _1 and _2 on Tuple2. They simply map every element of (A,B) to a corresponding element in A and B. The monoid structure is preserved because:
And this really is a homomorphism because we just inherit the homomorphism law from f and g.
What does a coproduct then look like? Well, it’s going to be a type C[A,B] together with an instance coproduct[A:Monoid,B:Monoid]:Monoid[C[A,B]]. It will be equipped with two monoid homomorphisms, left: A => C[A,B] and right: B => C[A,B] that satisfy the following (according to the monoid homomorphism law):
And additionally, for any other monoid Z and homomorphisms f: A => Z and g: B => Z we must be able to construct a unique homomorphism from C[A,B] to Z:
We can simply come up with a data structure required for the coproduct to satisfy a monoid. We will start with two constructors, one for the left side, and another for the right side:
This certainly allows an embedding of both monoids A and B. But These is now basically Either, which we know doesn’t quite form a monoid. What’s needed is a zero, and a way of appending an A to a B. The simplest way to do that is to add a product constructor to These:
1
caseclassBoth[A,B](a:A,b:B)extendsThese[A,B]
Now These[A,B] is a monoid as long as A and B are monoids:
These[A,B] is the smallest monoid that contains both A and B as submonoids (the This and That constructors, respectively) and admits a homomorphism from both A and B. And notice that we simply added the least amount of structure possible to make These[A,B] a monoid (the Both constructor). But is it really a coproduct?
First we must prove that This and That really are homomorphisms. We need to prove the following two properties:
No! Something has gone wrong. This will only hold if the Z monoid is commutative. So in general, These[A,B] is not the coproduct of A and B. My error was in the Both constructor, which commutes all B values to the right and all A values to the left.
So what kind of thing would work? It would have to solve this case:
1
case(Both(a1,b1),Both(a2,b2))=>???
We need to preserve that a1, b1, a2, and b2 appear in that order. So clearly the coproduct will be some kind of list!
And let’s add an empty case for the combined zero:
1
caseclassNeither[A,B]()extendsThese[A,B]
In which case These[A,B] has become a kind of tree, or an unbalanced list of This[A] and That[B] values. A free product of the two monoids A and B. The implementation of append for the coproduct monoid could be:
This is now a homomorphism! We already know that this is so for the This and That cases. And now that the Both case simply appeals to the inductive hypothesis, we know that it holds for Both as well.
Free monoids on coproducts
To better understand what’s going on, let’s try going the other way. What if we start with the coproduct of the underlying sets and get a free monoid from there?
The underlying set of a monoid A is just the type A without the monoid structure. The coproduct of types A and B is the type Either[A,B]. Having “forgotten” the monoid structure of both A and B, we can recover it by generating a free monoid on Either[A,B], which is just List[Either[A,B]]. The append operation of this monoid is list concatenation, and the identity for it is the empty list.
Clearly List[Either[A,B]] is a monoid, but does it permit a homomorphism from both monoids A and B? If so, then the following properties should hold:
They clearly do not hold! The lists on the left of == will have two elements and the lists on the right will have one element. Can we do something about this?
Well, the fact is that List[Either[A,B]] is not exactly the monoid coproduct of A and B. It’s “too big” in a sense. But if we were to reduce the list to a normal form that approximates a “free product”, we can get a coproduct that matches our definition above. What we need is a new monoid:
Eithers[A,B] is a kind of List[Either[A,B]] that has been normalized so that consecutive As and consecutive Bs have been collapsed using their respective monoids. So it will contain alternating A and B values.
This is now a monoid coproduct because it permits monoid homomorphisms from A and B:
If either of e1 or e2 is empty then the result is the fold of the other, so those cases are trivial. If they are both nonempty, then they will have one of these forms:
In the first two cases, on the right of the == sign in the law, we perform a1 |+| a2 and b1 |+| b2 respectively before concatenating. In the other two cases we simply concatenate the lists. The ++ method on Eithers takes care of doing this correctly for us. On the left of the == sign we fold the lists individually and they will be alternating applications of f and g. So then this law amounts to the fact that f(a1 |+| a2) == f(a1) |+| f(a2) in the first case, and the same for g in the second case. In the latter two cases this amounts to a homomorphism on List. So as long as f and g are homomorphisms, so is fold(_)(f,g). Therefore, Eithers[A,B] really is a coproduct of A and B.
Since we have already convinced ourselves that These is a monoid coproduct, we could also simply show that there is a homomorphism from Eithers to These: