Eugene Mymrin/Moment via Getty
Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
Even the best AI coding models succeed less than 23% of the time.
AI isn't falling short of its potential; it's being oversold.
AI advocates need to show the positive and negative sides.
There has been a lot of debate about the success rates of artificial intelligence, with consternation that rising investments in AI tools and infrastructure are falling short of delivering the highly anticipated results that vendors and consultants often promise.
For technology teams working in the trenches to integrate and incorporate AI into their technology stacks, the challenge has been daunting, a new survey shows. The BlueOptima AI Refactoring Evaluation (BARE) reports that even the best AI coding models succeeded less than 23% of the time when working on real production code. What's more, benchmark scores don't reflect real-world performance. Most models scored above 85% on popular benchmarks, but averaged just 17% success on production maintainability tasks.
Also: OpenAI upgrades Codex to automate your workflows - and compete better with Claude Code
The study benchmarked 57 LLMs on maintainability-oriented refactoring tasks drawn from 4,276 real source-code files spanning nine programming languages (C, C++, C#, Go, Java, JavaScript, PHP, Python, TypeScript), yielding 243,732 model-file evaluation pairs.
... continue reading