ChatGPT maker OpenAI has released a new evaluation, dubbed GDPval, to measure how well its AIs perform on “economically valuable, real-world tasks across 44 occupations.” “People often speculate about AI’s broader impact on society, but the clearest way to understand its potential is by looking at what models are already capable of doing,” the company wrote in an accompanying blog post. “Evaluations like GDPval help ground conversations about future AI improvements in evidence rather than guesswork, and can help us track model improvement over time,” OpenAI added. It’s one of the most straightforward attempts to justify its AI models’ financial viability to date, following skepticism that the tech may prove to be a dead end. Experts have often criticized the company’s boastful marketing, such as CEO Sam Altman claiming that its GPT-5 model had achieved “PhD-level” intelligence. In “early results,” GDPval found that “today’s best frontier models are already approaching the quality of work produced by industry experts” — a clear shot across the bow at critics who say the tech isn’t up to the demands of the workplace. The 44 occupations where “AI could have the highest impact on real-world productivity” included a litany of professions including real estate sales agents, social workers, industrial engineers, software developers, lawyers, registered nurses, customer service representatives, pharmacists, private detectives, and financial advisors. The specific tasks, as laid out in a paper, range from creating a “competitor landscape for last mile delivery” for a financial analyst, assessing “skin lesion images” for a registered nurse, and designing a sales brochure for a real estate agent. Surprisingly, the company found that its competitor Anthropic’s Claude Opus 4.1 was the “best performing model” after being graded by industry experts across 220 tasks, followed by GPT-5, which “excelled in particular on accuracy.” An extra powerful version of GPT-5, called GPT-5-high, was “rated as better than or on par with the deliverables from industry experts” just over 40 percent of the time. GPT-4o, which was released more than a year ago, scored a mere 13.7 percent. To be clear, OpenAI is treading carefully around the subject of replacing human jobs altogether. Its language suggests that AI will “support people in the work they do every day” instead of saying outright that anyone could soon be out of work because of AI. That’s unsurprising, considering the negative optics of celebrating the loss of employment. At the same time, whether that’s really an honest interpretation of the industry’s motives and end goals remains dubious. AI executives have long boasted about replacing human labor with AI — drastic cost-cutting measures that are already starting to backfire for some companies. There’s also good reason to take OpenAI’s latest evaluation results with a massive grain of salt. We’ve already seen the use of AI cause major headaches for software developers, lawyers, and even customer service representatives, often requiring more human oversight, not less. Hallucinations, in particular, remain a major sticking point, undercutting the output of large language model-based tools, forcing users to spend more time combing over the output of AIs for false information. And while AI often excels at generating bursts of text in a particular style, it’s easy for it to go off the rails during longer and less predictable tasks. Real-world tasks are rarely “clearly defined with a prompt and reference files,” OpenAI admitted. “Early GDPval results show that models can already take on some repetitive, well-specified tasks faster and at lower cost than experts,” the company wrote. “However, most jobs are more than just a collection of tasks that can be written down.” More on OpenAI: NBA Coach JJ Redick Says He Spends Hours Talking to His “Friend” ChatGPT