GoKawiil - AI models still struggle to debug software, Microsoft study shows

AI models from OpenAI, Anthropic, and other top AI labs are increasingly being used to assist with programming tasks. Google CEO Sundar Pichai said in October that 25% of new code at the company is generated by AI, and Meta CEO Mark Zuckerberg has expressed ambitions to widely deploy AI coding models within the social media giant. Yet even some of the best models today struggle to resolve software bugs that wouldn’t trip up experienced devs. A new study from Microsoft Research, Microsoft’s R&D division, reveals that models, including Anthropic’s Claude 3.7 Sonnet and OpenAI’s o3-mini, fail to debug many issues in a software development benchmark called SWE-bench Lite. The results are a sobering reminder that, despite bold pronouncements from companies like OpenAI, AI is still no match for human experts in domains such as coding. The study’s co-authors tested nine different models as the backbone for a “single prompt-based agent” that had access to a number of debugging tools, includ ... Read full article.

Find Related products on Amazon

AI models still struggle to debug software, Microsoft study shows

Related Articles