Google released on Thursday a “reimagined” version of its research agent Gemini Deep Research based on its much-ballyhooed state-of-the-art foundation model, Gemini 3 Pro.
This new agent isn’t just designed to produce research reports – although it can still do that. It now allows developers to embed Google’s SATA-model research capabilities into their own apps. That capability is made possible through Google’s new Interactions API, which is designed to give devs more control in the coming agentic AI era.
The new Gemini Deep Research tool is an agent equipped to synthesize mountains of information and handle a large context dump in the prompt. Google says it’s used by customers for tasks ranging from due diligence to drug toxicity safety research.
Google also says it will soon be integrating this new deep research agent into services, including Google Search, Google Finance, its Gemini App and its popular NotebookLM. This is another step towards preparing for a world where humans don’t Google anything anymore, their AI agents do.
The tech giant says that Deep Research benefits from Gemini 3 Pro’s status as its “most factual” model that is trained to minimize hallucinations during complex tasks.
AI hallucinations – where the LLM just makes stuff up – are an especially crucial issue for long-running, deep reasoning agentic tasks, in which many autonomous decisions are made over minutes, hours, or longer. The more choices an LLM has to make, the greater the chance that even one hallucinated choice will invalidate the entire output.
To prove its progress claims, Google has also created yet another benchmark (as if the AI world needs another one). The new benchmark is unimaginatively named DeepSearchQA, and is intended to test agents on complex, multi-step information-seeking tasks. Google has open sourced this benchmark.
Techcrunch event Join the Disrupt 2026 Waitlist Add yourself to the Disrupt 2026 waitlist to be first in line when Early Bird tickets drop. Past Disrupts have brought Google Cloud, Netflix, Microsoft, Box, Phia, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, and Vinod Khosla to the stages — part of 250+ industry leaders driving 200+ sessions built to fuel your growth and sharpen your edge. Plus, meet the hundreds of startups innovating across every sector. Join the Disrupt 2026 Waitlist Add yourself to the Disrupt 2026 waitlist to be first in line when Early Bird tickets drop. Past Disrupts have brought Google Cloud, Netflix, Microsoft, Box, Phia, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, and Vinod Khosla to the stages — part of 250+ industry leaders driving 200+ sessions built to fuel your growth and sharpen your edge. Plus, meet the hundreds of startups innovating across every sector. San Francisco | WAITLIST NOW
It also tested Deep Research on Humanity’s Last Exam, a much-more interestingly named, independent benchmark of general knowledge filled with impossibly niche tasks; and BrowserComp, a benchmark for browser-based agentic tasks.
As you might expect, Google’s new agent bested the competition on its own benchmark, and Humanity’s. However, OpenAI’s ChatGPT 5 Pro was a surprisingly close second all the way around and slightly bested Google on BrowserComp.
... continue reading