OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace

ChatGPT maker OpenAI has released a new evaluation, dubbed GDPval, to measure how well its AIs perform on “economically valuable, real-world tasks across 44 occupations.”

“People often speculate about AI’s broader impact on society, but the clearest way to understand its potential is by looking at what models are already capable of doing,” the company wrote in an accompanying blog post.

“Evaluations like GDPval help ground conversations about future AI improvements in evidence rather than guesswork, and can help us track model improvement over time,” OpenAI added.

It’s one of the most straightforward attempts to justify its AI models’ financial viability to date, following skepticism that the tech may prove to be a dead end. Experts have often criticized the company’s boastful marketing, such as CEO Sam Altman claiming that its GPT-5 model had achieved “PhD-level” intelligence.

In “early results,” GDPval found that “today’s best frontier models are already approaching the quality of work produced by industry experts” — a clear shot across the bow at critics who say the tech isn’t up to the demands of the workplace.

The 44 occupations where “AI could have the highest impact on real-world productivity” included a litany of professions including real estate sales agents, social workers, industrial engineers, software developers, lawyers, registered nurses, customer service representatives, pharmacists, private detectives, and financial advisors.

The specific tasks, as laid out in a paper, range from creating a “competitor landscape for last mile delivery” for a financial analyst, assessing “skin lesion images” for a registered nurse, and designing a sales brochure for a real estate agent.

Surprisingly, the company found that its competitor Anthropic’s Claude Opus 4.1 was the “best performing model” after being graded by industry experts across 220 tasks, followed by GPT-5, which “excelled in particular on accuracy.”

An extra powerful version of GPT-5, called GPT-5-high, was “rated as better than or on par with the deliverables from industry experts” just over 40 percent of the time. GPT-4o, which was released more than a year ago, scored a mere 13.7 percent.

To be clear, OpenAI is treading carefully around the subject of replacing human jobs altogether. Its language suggests that AI will “support people in the work they do every day” instead of saying outright that anyone could soon be out of work because of AI. That’s unsurprising, considering the negative optics of celebrating the loss of employment.

... continue reading