Google's Gemini AI model might soon be able to act like an IT administrator who's taken over your faulty work laptop: moving your cursor around, clicking on things, typing in forms, all while you watch.
The company on Wednesday announced the release of its Gemini 2.5 Computer Use model in preview in its API, allowing developers to create agents that can interact directly with a computer's user interface. This visual understanding of the screen and its elements, through the capture and analysis of screenshots, allows an AI agent to do really anything on a computer that you might be able to, letting you delegate all kinds of tasks to an automated tool.
Don't miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.
While it mostly works in web browsers at the moment, it can do some actions on a mobile interface too. Google said the model, built on Gemini 2.5 Pro, isn't yet optimized for desktop operating system-level control.
This new model is part of a growing trend toward agentic AI, a technology that allows a model to go beyond the box of a chatbot and take actions in the real-ish world of the computer interface. Tools like ChatGPT Agent can already do things like order you a pizza, albeit with some limitations. Some agentic tools might replace mundane tasks in the workplace or in customer service interactions. At the same time, AI companies are increasingly bringing more and more of the things we would normally do on our own into the chatbot interface. With agents, the conversational nature of a tool like Gemini stands to replace the pointing and clicking you're used to.
Watch this: New Survey Shows AI Usage Increasing Among Kids, Xbox Game Pass Pricing Controversy and California Law Promises to Lower Volume on Ads | Tech Today 03:21
As AI agents gain the ability to manipulate websites and potentially apps on the computer itself, AI makers will need to build in significant safety precautions. Google acknowledged that in the blog post announcing the model, stating: "AI agents that control computers introduce unique risks, including intentional misuse by users, unexpected model behavior, and prompt injections and scams in the web environment."
Google reported it trained the model specifically to address those risks, including the ability to recognize when it's given a "high-stakes" command like sending an email or purchasing something. The model may also require the user to confirm before it takes a high-stakes action.