NurPhoto / Getty Images
Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
AI's efficacy at work is still proving lukewarm at best.
OpenAI's new evaluation measures its GDP impact in certain tasks.
Companies are under pressure to justify their tools' existence.
Despite so many AI tools flooding the market, promising increased productivity and even fully automated work, their impact so far has been inconsistent at best. As a recent MIT report noted, 95% of enterprise AI projects have failed; elsewhere, bosses are getting unsatisfactory AI-generated "workslop" from their direct reports, creating hours of additional labor -- not the intended effect.
Also: AI helps strong dev teams and hurts weak ones, according to Google's 2025 DORA report
OpenAI's new evaluation, GDPval, aims to change that by "measuring how AI performs on real-world, economically valuable tasks," the company said in an announcement Thursday. Companies and third-party testers already use industry benchmarks and other evaluations to determine how capable models are at tasks like coding and math. However, these can lean more academic than would be realistic once models are deployed; GDPval aims to narrow that gap between theory and practice.
... continue reading