AI coding tools can reduce productivity

The buzz about AI coding tools is unrelenting. To listen to the reports, startups are launching with tiny engineering teams, non-programmers are “vibe-coding” entire apps, and the job market for entry-level programmers is crashing. But according to a METR experiment conducted in the spring of 2025, there’s at least one cohort that AI tools still aren’t serving.

METR performed a rigorous study (blog post, full paper) to measure the productivity gain provided by AI tools for experienced developers working on mature projects. The results are surprising everyone: a 19 percent decrease in productivity. Even the study participants themselves were surprised: they estimated that AI had increased their productivity by 20 percent. If you take away just one thing from this study, it should probably be this: when people report that AI has accelerated their work, they might be wrong!

This result seems “too bad to be true” – so astonishing that it almost has to be spurious. However, the study was carefully designed, and I believe the findings are real. At the same time, I believe that at least some of the anecdotal reports of huge productivity boosts are real. This study doesn’t expose AI coding tools as a fraud, but it does remind us that they have important limitations (for now, at least) – confirming some things my colleague Taren wrote about in a previous post, First, They Came for the Software Engineers….

To begin with, I’ll explain how the study was done, and why I believe its results.

Finally, A Proper Scientific Trial of AI Coding Productivity

The study was carried out in pretty much the most rigorous fashion possible: an honest-to-goodness randomized controlled trial under real-world conditions. The subjects were experienced developers carrying out their everyday work.

The methodology was as follows:

METR recruited 16 developers from major open-source projects. Each developer selected a list of coding tasks from their todo list, breaking up large projects into tasks that they could complete in an hour or two. In all, 246 tasks were included in the study. The developers estimated how long it would take them to complete each task (a) under normal conditions, and (b) without using any AI tools. The percentage difference between these figures yields the predicted speedup – the degree to which the developer expected that AI tools would boost their productivity. Each task was randomly assigned to one of two categories: “AI Allowed” (the developer can use any tools they like) or “AI Disallowed” (the developer cannot use AI coding tools or features). The developers went about their work, while recording their screens for later analysis. After each task, they reported the time spent. For AI Allowed tasks, they also estimated how much time AI tools had saved them – the retrodicted speedup.

To compute the actual speedup – or, rather, slowdown! – provided by AI tools, the researchers compared the developers’ predictions of how long each task would take to the measured completion time. They found that the difference between predicted and actual times was 19% larger for AI Allowed tasks than for AI Disallowed tasks. Remember that when the developers estimate the task time, they don’t yet know whether they’ll be using AI for that task, so their estimates are unbiased.

AIs can write code much faster than any human, but that doesn’t always mean finishing first

... continue reading