Model intelligence is no longer the constraint for automation

The perception is that model improvement seems to be stagnating. GPT-5 wasn’t the step change that people were expecting. Yet, models continue to improve on reasoning benchmarks. Recently, both OpenAI and Google models were on par with gold medallists in the International Mathematical Olympiad 2025 (IMO). At the same time it’s still difficult to make AI agents work for relatively simple enterprise use cases. Why is there such a disparity in model performance between problem domains? Why are models so much better at complex maths tasks that only few humans can complete, while struggling at simple every day tasks done by most humans?

It’s because the bottleneck isn’t in intelligence, but in human tasks: specifying intent and context engineering.

A mental model for tasks

To understand this, let’s start by defining what is required to solve a task:

1) Problem specification: a precise, detailed definition of the latent intent of a task

2) Context: the local knowledge needed to solve the task

3) Solver: a model with tools that acts on the spec using the context

Every task has an underlying ‘latent intent’ with a complete set of requirements a correct solution should satisfy. A problem specification is an artifact that attempts to communicate this intent in a structured, precise and complete way. Specifications are often not complete. Let’s call the remaining uncertainty about the intent the specification gap.

When there's a specification gap, the solver will attempt to infer using:

Global priors : general knowledge and capabilities embedded in the model weights; its ‘world model’.

... continue reading