What years of production-grade concurrency teaches us about building AI agents

Python and JavaScript/TypeScript AI frameworks are reinventing what telecom solved in 1986. What 40 years of production-grade concurrency teaches us about building AI agents.

Recently, José Valim published "Why Elixir is the Best Language for AI", citing a Tencent study showing Elixir achieved the highest LLM code completion rate across 20 languages. Claude Opus 4 scored 80.3% on Elixir problems versus 74.9% for C#, the next-best performer.

But there's a deeper argument than "LLMs write good Elixir." It's this: the actor model that Erlang introduced in 1986 is the agent model that AI is rediscovering in 2026. Every pattern the Python AI ecosystem is building (isolated state, message passing, supervision hierarchies, fault recovery) already exists in the BEAM virtual machine. And it's been running telecom switches, WhatsApp, and Discord at scale for decades.

I've been building agentic commerce infrastructure at New Generation. Before that, I shipped a full AI stack serving more than 3 million merchants at a unicorn Brazilian fintech. Both systems run on Elixir. Here's why that's not a hipster language choice. It's an architectural inevitability.

A note on terminology: Throughout this post I refer to "the BEAM." BEAM is the virtual machine that runs both Erlang and Elixir code, similar to how the JVM runs both Java and Kotlin. Erlang (1986) created the VM and the concurrency model. Elixir (2012) is a modern language built on top of it with better ergonomics. When I say "BEAM," I mean the runtime and its properties. When I say "Elixir," I mean the language we write.

The 30-second request problem

Traditional web frameworks were designed for a world where requests take milliseconds. A user clicks, the server queries a database, renders HTML, responds in under 100ms. Rails, Django, Laravel: all optimized for this pattern.

AI agents broke that model. When a user asks an agent a question, the response takes 5 to 30 seconds. The agent calls an LLM, waits for streaming tokens, maybe invokes a tool, calls the LLM again, streams more tokens. One "request" might involve three round-trips to an LLM API, two database lookups, and a web search. And the connection stays open the entire time.

Now multiply that by 10,000 concurrent users. Each one holding an open connection for 15+ seconds. Traditional thread-per-request frameworks choke. You need async, you need concurrency, you need to hold thousands of long-lived connections without burning through memory.

The BEAM was built for exactly this. Ericsson designed it for telephone calls: the original long-lived connections. Each call holds state, runs for minutes, and the system needs to handle millions of them concurrently. BEAM lightweight processes are ~2KB each. You can spawn millions of them. Each one has its own heap, its own garbage collector, and is preemptively scheduled so no single process can hog the CPU.

... continue reading