Skip to content
Tech News
← Back to articles

Stealing from Biologists to Compile Haskell Faster

read original get Haskell Programming Book → more articles
Why This Matters

This article highlights how insights from biology, specifically RNA folding algorithms, are influencing compiler optimization techniques in Haskell's GHC. Improving the ApplicativeDo flag's algorithm not only enhances compiler performance but also demonstrates the interdisciplinary potential to solve complex problems in software development. This cross-pollination underscores the importance of innovative thinking in advancing both compiler technology and computational efficiency for consumers and the industry alike.

Key Takeaways

This started when someone mentioned, mostly in passing, that GHC has a flag for ApplicativeDo ( -foptimal-applicative-do ) that’s switched off by default because the algorithm behind it is too slow to use.

That sounded like a bug to me. An optimization that’s correct but disabled for being slow is the kind of thing you fix in an afternoon, I figured.

It wasn’t; it turned out to be a properly hard problem, and the problem has been eating at me for months.

ApplicativeDo is a quiet corner of GHC to start with. Most programs never switch it on, and most of the ones that do are fine with the default and never reach for the optimal flag, so we’re well into the weeds even by compiler-internals standards. But the problem under the flag is a good one , and chasing improvements to the algorithmic compexity for the optimal layout algorithm led somewhere I didn’t expect: the same dynamic program biologists use to predict how a strand of RNA folds.

What is ApplicativeDo, and why is it cool?

Consider a canonical example of fetching the number of friends two people have in common from a remote social graph:

numCommonFriends x y = do fx <- friendsOf x fy <- friendsOf y return (length (intersect fx fy))

The standard do desugaring turns this into sequential binds:

friendsOf x >>= \ fx -> friendsOf y >>= \ fy -> return (length (intersect fx fy))

This is strictly sequential. The second friendsOf call cannot start until the first one completes, because >>= passes the result of the left side to the right side. Even though fy does not depend on fx , >>= has no way to express that independence. We pay for two distinct network round-trips.

... continue reading