Skip to content
Tech News
← Back to articles

LLMs are breaking 20 year old system design

read original get AI System Design Book → more articles
Why This Matters

The rise of large language models (LLMs) and AI agents challenges the traditional web architecture, which assumes stateless compute and centralized database state. This shift necessitates new routing primitives and system designs to support long-running, stateful, and interactive processes, impacting how the tech industry approaches scalable, resilient, and user-interactive applications.

Key Takeaways

Web architecture is built on a 20-year-old assumption that state lives in the database, and compute is stateless. But we're missing a routing primitive.

The ‘cloud-native’ architecture of the last decade is built on a 20-year-old assumption: that state lives in the database, and compute is stateless. If you want to scale, you scale the database vertically (get a larger machine) [1][1] or design the database schema around partition the data and you scale your application servers horizontally (add more boxes). Any request can hit any server, the loadbalancer doesn’t care, and the database is the single source of truth.

LLMs and agents are quietly violating this assumption, and making this architecture increasingly hard to work with. Not all at once, but in three subtle ways:

Long running work: an agent doing a 10 minute task isn’t a ‘request’, it’s a long-running async process. Stateful compute: an agent might run multiple turns of a conversation, might process multiple tool calls, and relies on accumulated context. That state is not really ‘database state’, it’s the agents memory. Bi-directional interaction: the user wants to watch the agent think, to interrupt it and redirect it. That’s a conversation with a process, it’s not a query to a stateless API and database.

Durable execution solves part of the problem

Durable execution (Temporal, Inngest, Restate) is the industry’s current answer to the execution part. It makes the process durable and resilient. But we’re still pretending it’s stateless underneath. This works for the execution but doesn’t solve the interaction problem.

The routing problem HTTP + loadbalancer + stateless server can’t route to a specific process. It can only route to a database.

So the moment a client wants to talk to a process running in a durable execution framework, you’re back at the same routing problem. And so everyone reverts to polling. Poll a query endpoint to get the latest updates that the durable execution process wrote to a database.

It’s a universal workaround, but it still sucks for all the same reasons that polling sucks; latency choices around how often to poll, database load, wasted requests, terrible UX for streaming.

Ultimately, polling treats your database as a message bus. Which is what folks did before actual message buses existed. Polling is what you do when you can’t figure out how to address the thing you want to talk to. It’s a work around for a routing problem.

... continue reading