Google DeepMind unveils its first “thinking” robotics AI

Generative AI systems that create text, images, audio, and even video are becoming commonplace. In the same way AI models output those data types, they can also be used to output robot actions. That's the foundation of Google DeepMind's Gemini Robotics project, which has announced a pair of new models that work together to create the first robots that "think" before acting. Traditional LLMs have their own set of problems, but the introduction of simulated reasoning did significantly upgrade their capabilities, and now the same could be happening with AI robotics. The team at DeepMind contends that generative AI is a uniquely important technology for robotics because it unlocks general functionality. Current robots have to be trained intensively on specific tasks, and they are typically bad at doing anything else. "Robots today are highly bespoke and difficult to deploy, often taking many months in order to install a single cell that can do a single task," said Carolina Parada, head of robotics at Google DeepMind. The fundamentals of generative systems make AI-powered robots more general. They can be presented with entirely new situations and workspaces without needing to be reprogrammed. DeepMind's current approach to robotics relies on two models: one that thinks and one that does. Gemini Robotics 1.5: Learning Across Embodiments The two new models are known as Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. The former is a vision-language-action (VLA) model, meaning it uses visual and text data to generate robot actions. The "ER" in the other model stands for embodied reasoning. This is a vision-language model (VLM) that takes visual and text input to generate the steps needed to complete a complex task. The thinking machines Gemini Robotics-ER 1.5 is the first robotics AI capable of simulated reasoning like modern text-based chatbots—Google likes to call this "thinking," but that's a bit of a misnomer in the realm of generative AI. DeepMind says the ER model achieves top marks in both academic and internal benchmarks, which shows that it can make accurate decisions about how to interact with a physical space. It doesn't undertake any actions, though. That's where Gemini Robotics 1.5 comes in.

Google DeepMind unveils its first “thinking” robotics AI

Share this article

Related Articles