Gemini kind of sucks at news, according to major international study

Edgar Cervantes / Android Authority

TL;DR Among major AI news-summary systems, Google Gemini performed the worst, showing significant issues in many results.

Gemini struggled with identifying reliable sources, providing quotes, and linking to its source material.

While everyone’s tools are showing signs of improvement, Gemini still lags behind.

You cannot conduct a conversation about AI without someone quickly bringing up the inconvenient topic of mistakes. For as useful as these systems can be when it comes to organizing information, and as impressive as the content is that generative AI can seemingly pull out of nowhere, we don’t have to look far before we start noticing all the blemishes in this otherwise polished facade. While there’s definitely been progress since the bad old days of Google AI Overviews hallucinating utter nonsense, just how far have things really come? Some new research is taking a rather concerning look into just that.

Don’t want to miss the best from Android Authority? Set us as a favorite source in Google Discover to never miss our latest exclusive reports, expert analysis, and much more.

to never miss our latest exclusive reports, expert analysis, and much more. You can also set us as a preferred source in Google Search by clicking the button below.

The European Broadcasting Union (EBU) and BBC were interested in quantifying the performance of systems like OpenAI ChatGPT, Google Gemini, Microsoft Copilot, and Perplexity when it comes to delivering AI-generated news summaries, especially with 15% of under-25-year-olds relying on AI for their news. The BBC initially performed both a broad survey, as well as a series of six focus groups, all gathering data about our experiences with and opinions of these AI systems. That approach was later expanded for the EBU’s international analysis.

Looking at beliefs and expectations, some 42% of UK adults involved in this research reported that they trusted AI accuracy, with the number growing in younger age groups. They also claim to be very concerned with accuracy, and 84% say that factual errors would significantly impair that trust. While that may sound like an appropriately cautious approach, just how much of this content is really inaccurate — and are people noticing?

Based on the results, we’d have to largely guess “no,” as the majority of AI response were found to have some problem with them:

... continue reading