Context is the bottleneck for coding agents now

Intelligence is rapidly improving with each model release. Just last week it was announced that OpenAI got a perfect score on the 2025 ICPC programming contest, beating every single human contestant. They achieved this using a version (presumably a very high compute version, but still) of their publicly available GPT-5 model. And yet, coding agents are nowhere near capable of replacing software developers. Why is that? I’m going to argue that the limiting factor is no longer raw intelligence, but rather context. Existing coding agents simply do not have enough context about the problems they’re being asked to solve. This severely limits how long they can work effectively without human guidance. Intelligence and context How autonomous are existing coding agents? Let’s think about autonomy as a spectrum and see how far along that spectrum we are. Level 1 - A few lines of code This is what autocomplete does, and it works very well. Level 2 - One commit Cursor and Claude Code work well for tasks in this size range. Level 3 - One PR Devin and other async agents are built to tackle tasks of this size. But do they work reliably? Only on relatively simple tasks. Level 4 - Major feature or refactor Doing this autonomously on an existing codebase is beyond the reach of current agents. Level 5 - Entire codebase This is what the vibe coding products like Lovable and Replit do now, but it only works because they can start from scratch. The problem is that they usually hit a wall well before they get to a production-ready application. I’d say Level 2 is all we can reliably do on production codebases right now. And even that requires substantial human guidance and review. What will it take to move further along the autonomy spectrum, without sacrificing quality? When an agent fails at a task, the cause is usually one of two things. It’s either an intelligence failure, or it’s a context failure. Either the model didn’t have the information it needed, or it didn’t have the mental horsepower to process that information properly. There are other aspects that can affect performance, such as taste, but if we’re just talking about whether the agent succeeds or fails at a task, it’s sufficient to just consider intelligence and context. Also note that I’m including general world knowledge as part of intelligence, both for simplicity and because I think it’s hard to fully separate those two out. Programming competitions are competitions of intelligence. The entire context needed to solve a problem is provided in the problem statement itself. There’s no existing codebase to understand, no business requirements to consider, and no unwritten development processes you need to follow. The superhuman ICPC performance we saw this week, as well as the IOI gold medal-level performances from last month, strongly suggest that the raw intelligence and general programming knowledge of frontier models is sufficient to automate most software engineering work. Now these performances were achieved using models that are quite a bit stronger than the models used on a daily basis by developers, like Claude 4 and GPT-5. So we can’t quite say that lack of intelligence is never a cause of failure in current coding agents. They still do some pretty dumb stuff sometimes. But as models improve, more and more of the failures in agentic coding are failures of context, not failures of intelligence. What context does a coding agent need? Context isn’t just code. It’s also specs, dev practices, conversations, etc. When human developers write code, they’re drawing from a reservoir of implicit knowledge that goes far beyond what’s visible in the codebase itself. Current coding agents are operating with maybe 20% of this context, at best. What context does an agent need to reliably operate autonomously and ship code that’s as good or better than human developers? It’s the same things a human developer needs. There are the basics: It needs to be able to access all code files Most coding agents can already do this. It needs to be able to access documentation Most coding agents can do this if set up properly. It needs to be able to run code and see the output Most coding agents can already do this pretty well. And then there are the more subtle forms of context: It needs to have a high-level understanding of how the codebase is organized and where different code lives This is important for efficient execution and also for making sure you don’t miss things. Most tool-based agents, like Cursor and Claude Code, do not have this. Some agents are provided with something along these lines. It needs to understand all of the existing architectural patterns and conventions in the codebase Every codebase has its own dialect. Maybe you always use dependency injection in a specific way. Perhaps there’s an unwritten rule about where business logic lives versus presentation logic. Maybe you have a specific pattern for handling async operations that evolved organically over three years. Current agents struggle here because many of these patterns are emergent properties of the codebase that aren’t documented in any single place. They’re distributed across thousands of commits, pull requests, and code reviews. It needs to understand why things were done the way they were Why does the authentication system work the way it does? Because two years ago, there was a security incident that led to a complete redesign. Why don’t we use library X even though it seems perfect? Because it caused production issues in 2022. This tribal knowledge lives in Slack threads, meeting notes, incident post-mortems, and developers’ heads. It needs to understand development and deployment practices Testing expectations, style and comment guidelines, etc. Every team has unwritten rules about how code ships. Maybe you deploy to staging-east first because of a subtle dependency. Perhaps certain tests look weird because they’re working around a known race condition. The CI/CD pipeline has manual approval steps that seem redundant but prevent real disasters that happened in the past. Current agents can read your test configs and deployment scripts, but they don’t understand the “why” behind them. They might remove a “redundant” check that’s actually preventing a production issue, or follow official docs that everyone knows are outdated. It needs to understand product and business requirements Code doesn’t exist in a vacuum. That seemingly arbitrary validation rule? It’s there because of a regulatory requirement in the EU market. That weird data transformation? It’s handling an edge case for your biggest enterprise customer. I don’t know of any coding agents that are plugged into this kind of data right now. Notice how all the basic forms of context use the word “access” while the more subtle forms of context use the word “understand.” This is important. Most of this context is not written down in a single document that the agent can just read. To the extent that it’s written down at all, it’s often scattered across many different files and apps. Some of that information will be conflicting and out of date. Giving the agent this context is not as simple as just giving it an MCP connector to your Google Drive and Linear accounts. The information needs to be processed and synthesized by the agent. What does this mean for coding agents? First, we need to give them access to way more context. Much of this new context will require sophisticated preprocessing to make it usable, so this is not an easy problem. Second, not everything is written down. That means experienced human developers will still need to fill in the gaps for a very long time to come. Third, agents need to learn to identify when they’re missing context so they can ask for human guidance. Right now they seem to be trained to just plow forward with what they have.

Context is the bottleneck for coding agents now

Share this article

Related Articles