Google’s SIMA 2 agent uses Gemini to reason and act in virtual worlds

Google DeepMind shared on Thursday a research preview of SIMA 2, the next generation of its generalist AI agent that integrates the language and reasoning powers of Gemini, Google’s large language model, to move beyond simply following instructions to understanding and interacting with its environment.

Like many of DeepMind’s projects, including AlphaFold, the first version of SIMA was trained on hundreds of hours of video game data to learn how to play multiple 3D games like a human, even some games it wasn’t trained on. SIMA 1, unveiled in March 2024, could follow basic instructions across a wide range of virtual environments, but it only had a 31% success rate for completing complex tasks, compared to 71% for humans.

“SIMA 2 is a step change and improvement in capabilities over SIMA 1,” Joe Marino, senior research scientist at DeepMind, said in a press briefing. “It’s a more general agent. It can complete complex tasks in previously unseen environments. And it’s a self-improving agent. So it can actually self-improve based on its own experience, which is a step towards more general-purpose robots and AGI systems more generally.”

DeepMind says SIMA 2 doubles the performance of SIMA 1 Image Credits:Google DeepMind

SIMA 2 is powered by the Gemini 2.5 flash-lite model, and AGI refers to artificial general intelligence, which DeepMind defines as a system capable of a wide range of intellectual tasks with the ability to learn new skills and generalize knowledge across different areas.

Working with so-called “embodied agents” is crucial to generalized intelligence, DeepMind’s researchers say. Marino explained that an embodied agent interacts with a physical or virtual world via a body — observing inputs and taking actions much like a robot or human would — whereas a non-embodied agent might interact with your calendar, take notes, or execute code.

Jane Wang, a research scientist at DeepMind with a background in neuroscience, told TechCrunch that SIMA 2 goes far beyond gameplay.

“We’re asking it to actually understand what’s happening, understand what the user is asking it to do, and then be able to respond in a common-sense way that’s actually quite difficult,” Wang said.

Techcrunch event Join the Disrupt 2026 Waitlist Add yourself to the Disrupt 2026 waitlist to be first in line when Early Bird tickets drop. Past Disrupts have brought Google Cloud, Netflix, Microsoft, Box, Phia, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, and Vinod Khosla to the stages — part of 250+ industry leaders driving 200+ sessions built to fuel your growth and sharpen your edge. Plus, meet the hundreds of startups innovating across every sector. Join the Disrupt 2026 Waitlist Add yourself to the Disrupt 2026 waitlist to be first in line when Early Bird tickets drop. Past Disrupts have brought Google Cloud, Netflix, Microsoft, Box, Phia, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, and Vinod Khosla to the stages — part of 250+ industry leaders driving 200+ sessions built to fuel your growth and sharpen your edge. Plus, meet the hundreds of startups innovating across every sector. San Francisco | WAITLIST NOW

By integrating Gemini, SIMA 2 doubled its predecessor’s performance, uniting Gemini’s advanced language and reasoning abilities with the embodied skills developed through training.

... continue reading