Measuring the Impact of AI on Experienced Open-Source Developer Productivity

We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation .

See the full paper for more detail.

Motivation

While coding/agentic benchmarks have proven useful for understanding AI capabilities, they typically sacrifice realism for scale and efficiency—the tasks are self-contained, don’t require prior context to understand, and use algorithmic evaluation that doesn’t capture many important capabilities. These properties may lead benchmarks to overestimate AI capabilities. In the other direction, because benchmarks are run without live human interaction, models may fail to complete tasks despite making substantial progress, because of small bottlenecks that a human would fix during real usage. This could cause us to underestimate model capabilities. Broadly, it can be difficult to directly translate benchmark scores to impact in the wild.

One reason we’re interested in evaluating AI’s impact in the wild is to better understand AI’s impact on AI R&D itself, which may pose significant risks. For example, extremely rapid AI progress could lead to breakdowns in oversight or safeguards. Measuring the impact of AI on software developer productivity gives complementary evidence to benchmarks that is informative of AI’s overall impact on AI R&D acceleration.

Methodology

To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue. When AI is allowed, developers can use any tools they choose (primarily Cursor Pro with Claude 3.5/3.7 Sonnet—frontier models at the time of the study); when disallowed, they work without generative AI assistance. Developers complete these tasks (which average two hours each) while recording their screens, then self-report the total implementation time they needed. We pay developers $150/hr as compensation for their participation in the study.

Core Result

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

Below, we show the raw average developer forecasted times, and the observed implementation times—we can clearly see that developers take substantially longer when they are allowed to use AI tools.

... continue reading