Composition Shouldn't be this Hard

I’ve spent over a decade building data infrastructure: observability systems at Twitter, streaming data processing systems at Google, declarative data processing systems at Snowflake. From the beginning, I noticed a strange gap between the conceptual elegance of programming languages and databases, and the reality of developing and operating real systems using them. That reality is filled with tedium and stress. All of the systems I’ve ever worked on have felt brittle in one way or another: hard to change, and easy to break.

Infrastructure engineers develop paranoia around change. We invest more effort testing and deploying changes than making them. We call it maturity, but I’ve never stopped questioning it. There must be a way to delegate the tedium to our tools and focus on what attracted us to this field: brainstorming ideas, trying them out, and seeing their effects.

But what’s missing, exactly? Decades of effort by thousands of brilliant minds have gone into the field of computing, much of it directed at closing the gap between accidental and inherent complexity. Surely some major innovation in the foundation isn’t just waiting to be discovered—wouldn’t someone have found it already?

Maybe. But maybe not. The structure of modern abstractions points to a specific opportunity: the status quo forces a choice between powerful tools and general-purpose tools. This feels like a false dichotomy. There’s no reason we can’t have both—if we can find the right model.

After years of searching, I think I’ve found a model that can break out of this tradeoff. Implementing it is more than I can do alone, which is why my cofounders, Daniel Mills and Skylar Cook, and I are starting Cambra. We are developing a new kind of programming system that rethinks the traditional internet software stack on the basis of a new model. Our goal: make developing internet software feel like working on a single, coherent system, not wiring together a fragmented mess of components. In what follows, I will explain why models matter, how fragmentation undermines them, and why building multi-domain coherent systems is both possible and necessary.

Models Give You Superpowers

Computers are magic. They let abstract concepts manifest in and affect the real world. A spreadsheet formula updates a budget, and you decide whether you can afford that shiny new thing. A routing algorithm computes the shortest path, and you arrive at your destination. A database records a transaction, and money moves between bank accounts.

Every computer program works in terms of a model: an abstract way to represent the world in simplified terms. Models allow programs to ignore the overwhelming complexity of reality, and instead focus on the parts of the world that are essential to the programmer’s goal. At its most reductive, a program is a loop that receives input, updates internal state, computes consequences, and sends output. However, that oversimplification masks a deep truth: the choice of model has a huge impact on which programs are feasible to develop and maintain. In other words, there are better models and worse models. Better models rely on intuitive, well-behaved concepts and give you useful rules about how to create programs and reason about their behavior. Great models give you superpowers. They don’t just make programs easier to read and write. They make them easier to reason about. They make it possible to create tooling that can verify, optimize, and refactor programs automatically.

So why don’t we just use great models all the time? To answer that, we need to start at the bottom. All modern computer programs ultimately work in terms of the same foundational model: bits stored in memory and instructions to manipulate them. But this model is so low-level that it’s hard to map its concepts to the familiar concepts we typically care about. In other words, given a program written in terms of bits and instructions, it’s very difficult to infer its purpose. Conversely, given an intuitive specification of a program’s effects on the real world, it’s very difficult to map this specification to a “bits and instructions” program.

To make this mapping easier, we build higher-level models atop this foundation: programming languages, operating systems, databases. Programming in terms of a higher-level model comes with a sacrifice: you give up control over how the program is “lowered” into lower-level terms. But with that loss of control comes a reduction in complexity, which is often a favorable trade. For example, garbage collection allows a programmer to not worry about deallocation, in exchange for giving up control over memory management.

... continue reading