Analysis Finds That Google’s AI Overviews Are Providing Misinformation at a Scale Possibly Unprecedented in the History of Human Civilization

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

Google’s AI Overviews are peddling misinformation on a scale that may be virtually unprecedented in human history.

A recent analysis conducted by the AI startup Oumi at the behest of The New York Times found that the AI-generated summaries, which appear above Google search results, are accurate around 91 percent of the time.

In a sense, that may sound like an impressive figure. But here’s an even more impressive one: five trillion. That’s roughly the number of search queries that Google processes every year, translating to tens of millions of wrong answers that the AI Overviews are providing every hour — and hundreds of thousands every minute, the analysis calculated.

In other words, Google has created a misinformation crisis. Studies have shown that people tend to trust what an AI tells them without question, with one report finding that only 8 percent of users actually double checked an AI’s answer. Another experiment found that users still listened to AI when it gave them the wrong answer nearly 80 percent of the time — a grim trend the researchers dubbed “cognitive surrender.”

Large language models adopt an authoritative tone and can confidently present fabricated information as fact when it can’t immediately glean a straight answer. Add the convenience that Google’s AI Overviews offer, and it’s easy to imagine untold numbers of users taking its summaries at their word.

Oumi conducted the analysis using a test called SimpleQA, a widely used benchmark for AI accuracy in the industry which was designed by OpenAI. The first round of tests, conducted in October, used a version of the AI Overviews powered by Google’s Gemini 2 model. A follow-up conducted in February tested the feature after it was switched to Gemini 3, its much-hyped upgrade.

Each round of tests involved 4,326 Google searches. Gemini 3 came out the more accurate model, giving a factually sound response 91 percent of the time. Gemini 2 performed significantly worse, at just 85 percent accurate.

On the one hand, it shows that the models are improving. On the other, it shows that Google was willing to foist a model on its userbase that was even more prone to hallucinating, in an ongoing experiment that’s still misinforming hundreds of millions of people.

Google called the analysis flawed. “This study has serious holes,” Ned Adriance, a Google spokesman, told the NYT in a statement. “It doesn’t reflect what people are actually searching on Google.”

... continue reading