Static types, algebraic data types, making illegal states unrepresentable: the functional programming tradition has developed extraordinary tools for reasoning about programs. I have spent over a decade writing Haskell professionally, and I believe in all of it.
But the very effectiveness of these tools creates a particular susceptibility. We sometimes mistake reasoning about programs for reasoning about systems. These are not the same activity, and the instincts that make you good at one do not automatically transfer to the other.
This is not a uniquely FP problem. Every programming community treats “the program” as its primary object of study. But FP practitioners are in a distinctive position: our tools for local correctness are powerful enough to foster an unwarranted confidence about system-level properties. The type checker is honest about what it checks. The trouble starts when we forget where its jurisdiction ends. Every language community has its own version of this forgetting; the FP community just has the most sophisticated reason to believe it’s unnecessary.
A caveat before we go further: this essay is grounded in the world of web services, service-oriented architectures, and the distributed systems that inevitably emerge from them. If you’re building video games, CLI tools, or embedded firmware, the version boundaries look different and much of this won’t apply. But if you ship code that talks to other code across a network, and especially if you’ve ever had to deploy a change without taking the whole system down at once, this is for you.
The good news is that the research community has been quietly assembling the tools we need, if you know where to look.
Your monolith is a distributed system
Before we talk about types, I want to establish something that I find myself arguing repeatedly: every production system is a distributed system, including your monolith.
If you have a web application with more than one server, you have a distributed system. If you have background job workers, you have a distributed system. If you have a cron job, you have a distributed system. If you talk to Stripe, or send emails through SendGrid, or enqueue something in Redis, or write to a Postgres replica, then you are (I regret to inform you) operating a distributed system. The word “monolith” describes your deployment artifact. It does not describe your runtime topology.
This matters because the interesting correctness problems in production are almost always systemic rather than local. They live in the interactions between components running different versions of your code, or operating on different assumptions about the state of the database, or retrying an operation that already partially succeeded somewhere else. These are not problems that any single-program analysis can catch, regardless of how sophisticated your type system is.
Most programming language communities (FP included) tend to treat “the program” as the object of study. We write papers about programs. We verify programs. We optimize programs. But in production, correctness is not a property of a program. It is a property of a system. And once you see this clearly, some of the most cherished practices across our industry start to look like they’re aimed at the wrong altitude.
... continue reading