What async promised and what it delivered

OS threads are expensive: an operating system thread typically reserves a megabyte of stack space and takes roughly a millisecond to create. Context switches happen in kernel space and burn CPU cycles. A server handling thousands of concurrent connections and dedicating one thread per connection means thousands of threads each consuming memory and competing for scheduling. The system spends time managing threads that could be better spent doing useful work.

This is the C10K problem, named by Dan Kegel in 1999. If you were building a web server, a chat system, or anything with a large number of simultaneous connections, you needed a way to handle concurrency without a thread per connection.

The answer came in waves, each solving the previous wave’s worst problem while introducing new ones. Previously we’ve looked at channels in Go and actors in Erlang. Now we turn to async, which is everywhere these days.

Callbacks

The first wave was straightforward: don’t block the thread. Instead of waiting for an i/o operation to complete, register a function to be called when it finishes and move on to the next piece of work. Event loops (select, poll, epoll, kqueue) multiplexed thousands of connections onto a handful of threads, and callbacks were the programmer’s interface to this machinery.

Node.js built an entire ecosystem on this model, handling thousands of concurrent connections on a single thread. Nginx’s event-driven architecture was a major reason it displaced Apache for high-concurrency workloads.

This nicely solved the performance problem, but at a cost: callbacks invert control flow. Instead of writing “do A, then B, then C” as three sequential statements, you write “do A, and when it’s done call this function, which does B, and when that’s done call this other function, which does C.” The programmer’s intent becomes scattered across nested closures. JavaScript developers named this “callback hell” and built an entire website to commiserate.

Callbacks have deeper problems than aesthetics, such as fracturing error handling. Each callback needs its own error path. Errors can’t propagate naturally up the call stack because there is no call stack (callbacks run in a different context from where they are registered). Handling partial failure in a chain of callbacks means threading error state through every function in the chain.

Plus, callbacks have no notion of cancellation. If you start an asynchronous operation and then decide you don’t need the result, there’s no general way to stop it. The callback will fire eventually, and your code needs to handle the case where it no longer cares about the result.

Callbacks solved the resource problem (too many threads) by creating an ergonomics problem (code that’s hard to write, read, and get right).

... continue reading