Software engineer on the real state of AI agents (they're not there yet)

Serving tech enthusiasts for over 25 years.TechSpot means tech analysis and advice you can trust

A hot potato: Amid growing hype around AI agents, one experienced engineer has brought a grounded perspective shaped by work on more than a dozen production-level systems spanning development, DevOps, and data operations. From his vantage point, the notion that 2025 will bring truly autonomous workforce-transforming agents looks increasingly unrealistic.

In a recent blog post, systems engineer Utkarsh Kanwat points to fundamental mathematical constraints that challenge the notion of fully autonomous multi-step agent workflows. Since production-grade systems require upwards of 99.9 percent reliability, the math quickly makes extended autonomous workflows unfeasible.

"If each step in an agent workflow has 95 percent reliability, which is optimistic for current LLMs, five steps yield 77 percent success, 10 steps 59 percent, and 20 steps only 36 percent," Kanwat explained.

Even hypothetically improved per-step reliability of 99 percent falls short at about 82 percent success for 20 steps.

"This isn't a prompt engineering problem. This isn't a model capability problem. This is mathematical reality," Kanwat says.

Kanwat's DevOps agent avoids the compounded error problem by breaking workflows into 3 to 5 discrete, independently verifiable steps, each with explicit rollback points and human confirmation gates. This design approach – emphasizing bounded contexts, atomic operations, and optional human intervention at critical junctures – forms the foundation of every reliable agent system he has built. He warns that attempting to chain too many autonomous steps inevitably leads to failure due to compounded error rates.

Token cost scaling in conversational agents presents a second, often overlooked barrier. Kanwat illustrates this through his experience prototyping a conversational database agent, where each new interaction had to process the full previous context – causing token costs to scale quadratically with conversation length.

In one case, a 100-turn exchange cost between $50 and $100 in tokens alone, making widespread use economically unsustainable. Kanwat's function-generation agent sidestepped the issue by remaining stateless: description in, function out – no context to maintain, no conversation to track, and no runaway costs.

"The most successful 'agents' in production aren't conversational at all," Kanwat says. "They're smart, bounded tools that do one thing well and get out of the way."

... continue reading