is The Verge’s senior AI reporter. An AI beat reporter for more than five years, her work has also appeared in CNBC, MIT Technology Review, Wired UK, and other outlets.
When an AI model release immediately spawns memes and treatises declaring the rest of the industry cooked, you know you’ve got something worth dissecting.
Google’s Gemini 3 was released Tuesday to widespread fanfare. The company called the model a “new era of intelligence,” integrating it into Google Search on day one for the first time. It’s blown past OpenAI and other competitors’ products on a range of benchmarks and is topping the charts on LMArena, a crowdsourced AI evaluation platform that’s essentially the Billboard Hot 100 of AI model ranking. Within 24 hours of its launch, more than one million users tried Gemini 3 in Google AI Studio and the Gemini API, per Google. “From a day one adoption standpoint, [it’s] the best we’ve seen from any of our model releases,” Google DeepMind’s Logan Kilpatrick, who is product lead for Google’s AI Studio and the Gemini API, told The Verge.
Even OpenAI CEO Sam Altman and xAI CEO Elon Musk publicly congratulated the Gemini team on a job well done. And Salesforce CEO Marc Benioff wrote that after using ChatGPT every day for three years, spending two hours on Gemini 3 changed everything: “Holy shit … I’m not going back. The leap is insane — reasoning, speed, images, video… everything is sharper and faster. It feels like the world just changed, again.”
“This is more than a leaderboard shuffle,” said Wei-Lin Chiang, cofounder and CTO of LMArena. Chiang told The Verge that Gemini 3 Pro holds a “clear lead” in occupational categories including coding, match, and creative writing, and its agentic coding abilities “in many cases now surpass top coding models like Claude 4.5 and GPT-5.1.” It also got the top spot on visual comprehension and was the first model to surpass a ~1500 score on the platform’s text leaderboard.
The new model’s performance, Chiang said, “illustrates that the AI arms race is being shaped by models that can reason more abstractly, generalize more consistently, and deliver dependable results across an increasingly diverse set of real-world evaluations.”
Alex Conway, principal software engineer at DataRobot, told The Verge that one of Gemini 3’s most notable advancements was on a specific reasoning benchmark called ARC-AGI-2. Gemini scored almost twice as high as OpenAI’s GPT-5 Pro while running at one-tenth of the cost per task, he said, which is “really challenging the notion that these models are plateauing.” And on the SimpleQA benchmark — which involves simple questions and answers on a broad range of topics, and requires a lot of niche knowledge — Gemini 3 Pro scored more than twice as high as OpenAI’s GPT-5.1, Conway flagged. “Use case-wise, it’ll be great for a lot more niche topics and diving deep into state-of-the-art research and scientific fields,” he said.
But leaderboards aren’t everything. It’s possible — and in the high-pressure AI world, tempting — to train a model for narrow benchmarks rather than general-purpose success. So to really know how well a system is doing, you have to rely on real-world testing, anecdotal experience, and complex use cases in the wild.
The Verge spoke with professionals across disciplines who use AI every day for work. The consensus: Gemini 3 looks impressive, and it does a great job on a wide breadth of tasks — but when it comes to edge cases and niche aspects of certain industries, many professionals won’t be replacing their current models with it anytime soon.
The majority of people The Verge spoke with plan to continue to use Anthropic’s Claude for their coding needs, despite Gemini 3’s advancements in that space. Some also said that Gemini 3 isn’t optimal on the user interaction front. Tim Dettmers, assistant professor at Carnegie Mellon University and a research scientist at Ai2, said that though it’s a “great model,” it’s a bit raw when it comes to UX, meaning “it doesn’t follow instructions precisely.”
... continue reading