Principles of Mechanical Sympathy

I tell stories about AI, robotics, and interactive media with code and care -- and I transform them into products. I've created robotics platforms with Google, architected data and AI infrastructure at Wayfair and Thoughtworks, and patented cryptographic consensus protocols for distributed systems like robot swarms as founder of my studio, With Caer.

Modern hardware is remarkably fast, but software often fails to leverage it. Mechanical sympathy - a concept borrowed from racing and popularized in software by Martin Thompson - is the practice of creating software that is sympathetic to its underlying hardware. This practice can be distilled into a set of everyday principles: Predictable memory access, awareness of cache lines, the single-writer principle, and natural batching. Together, these principles can be used to optimize everything from an AI inference server to a distributed data platform.

Over the past decade, hardware has seen tremendous advances, from unified memory that's redefined how consumer GPUs work, to neural engines that can run billion-parameter AI models on a laptop.

And yet, software is still slow, from seconds-long cold starts for simple serverless functions, to hours-long ETL pipelines that merely transform CSV files into rows in a database.

Back in 2011, a high-frequency trading engineer named Martin Thompson noticed these issues, attributing them to a lack of Mechanical Sympathy. He borrowed this phrase from a Formula 1 champion:

You don't need to be an engineer to be a racing driver, but you do need Mechanical Sympathy. -- Sir Jackie Stewart, Formula 1 World Champion

Although we're not (usually) driving race cars, this idea applies to software practitioners. By having “sympathy” for the hardware our software runs on, we can create surprisingly performant systems. The mechanically-sympathetic LMAX Architecture processes millions of events per second on a single Java thread.

Inspired by Martin's work, I've spent the past decade creating performance-sensitive systems, from AI inference platforms serving millions of products at Wayfair, to novel binary encodings that outperform Protocol Buffers.

In this article, I cover the principles of mechanical sympathy I use every day to create systems like these - principles that can be applied most anywhere, at any scale.

Not-So-Random Memory Access Mechanical sympathy starts with understanding how CPUs store, access, and share memory. Figure 1: An abstract diagram of how CPU memory is organized Most modern CPUs - from Intel's chips to Apple's silicon - organize memory into a hierarchy of registers, buffers, and caches, each with different access latencies: Each CPU core has its own high-speed registers and buffers which are used for storing things like local variables and in-flight instructions.

... continue reading