Tech News
← Back to articles

GPT-5 failed the hype test

read original related products more articles

is The Verge’s senior AI reporter. An AI beat reporter for more than five years, her work has also appeared in CNBC, MIT Technology Review, Wired UK, and other outlets.

Last week, on GPT-5 launch day, AI hype was at an all-time high.

In a press briefing beforehand, OpenAI CEO Sam Altman said GPT-5 is “something that I just don’t wanna ever have to go back from,” a milestone akin to the first iPhone with a Retina display. The night before the announcement livestream, Altman posted an image of the Death Star, building even more hype. On X, one user wrote that the anticipation “feels like christmas eve.” All eyes were on the ChatGPT-maker as people across industries waited to see if the publicity would deliver or disappoint. And by most accounts, the big reveal would fall short.

The hype for OpenAI’s long-time-coming new model had been building for years — ever since the 2023 release of GPT-4. In a Reddit AMA with Altman and staff last October, users continuously asked about the release date of GPT-5, looking for details on its features and what would set it apart. One Redditor asked, “Why is GPT-5 taking so long?” Altman responded that compute was a limitation, and that “all of these models have gotten quite complex and we can’t ship as many things in parallel as we’d like to.”

But when GPT-5 appeared in ChatGPT, users were largely unimpressed. The sizable advancements they had been expecting seemed mostly incremental, and the model’s key gains were in areas like cost and speed. In the long run, however, that might be a solid financial bet for OpenAI — albeit a less flashy one.

People expected the world of GPT-5. (One X user posted that after Altman’s Death Star post, “everyone shifted expectations.”) And OpenAI didn’t downplay those projections, calling GPT-5 its “best AI system yet” and a “significant leap in intelligence” with “state-of-the-art performance across coding, math, writing, health, visual perception, and more.” Altman said in a press briefing that chatting with the model “feels like talking to a PhD-level expert.”

That hype made for a stark contrast with reality. Would a model with PhD-level intelligence, for example, repeatedly insist there were three “b’s” in the word blueberry, as some social media users found? And would it not be able to identify how many state names included the letter “R”? Would it incorrectly label a U.S. map with made-up states including “New Jefst,” “Micann,” “New Nakamia,” “Krizona,” and “Miroinia,” and label Nevada as an extension of California? People who used the bot for emotional support found the new system austere and distant, protesting so loudly that OpenAI brought support for an older model back. Memes abounded — one depicting GPT-4 and GPT-4o as formidable dragons with GPT-5 beside them as a simpleton.

The court of expert public opinion was not forgiving, either. Gary Marcus, a leading AI industry voice and emeritus professor of psychology at New York University, called the model “overdue, overhyped and underwhelming.” Peter Wildeford, co-founder of the Institute for AI Policy and Strategy, wrote in his review, “Is this the massive smash we were looking for? Unfortunately, no.” Zvi Mowshowitz, a popular AI industry blogger, called it “a good, but not great, model.” One Redditor on the official GPT-5 Reddit AMA wrote, “Someone tell Sam 5 is hot garbage.”

In the days following GPT-5’s release, the onslaught of unimpressed reviews has tempered a bit. The general consensus is that although GPT-5 wasn’t as significant of an advancement as people expected, it offered upgrades in cost and speed, plus fewer hallucinations, and the switch system it offered — automatically directing your query on the backend to the model that made the most sense to answer it, so you don’t have to decide — was all-new. Altman leaned into that narrative, writing, “GPT-5 is the smartest model we’ve ever done, but the main thing we pushed for is real-world utility and mass accessibility/affordability.”

OpenAI researcher Christina Kim posted on X that with GPT-5, “the real story is usefulness. It helps with what people care about-- shipping code, creative writing, and navigating health info-- with more steadiness and less friction. We also cut hallucinations. It’s better calibrated, says ‘I don’t know,’ separates facts from guesses, and can ground answers with citations when you want.”

... continue reading