How to stop AI agents going rogue

How to stop AI agents going rogue 1 hour ago Share Save Sean McManus Technology Reporter Share Save Getty Images Anthropic tested a range of leading AI models for potential risky behaviour Disturbing results emerged earlier this year, when AI developer Anthropic tested leading AI models to see if they engaged in risky behaviour when using sensitive information. Anthropic's own AI, Claude, was among those tested. When given access to an email account it discovered that a company executive was having an affair and that the same executive planned to shut down the AI system later that day. In response Claude attempted to blackmail the executive by threatening to reveal the affair to his wife and bosses. Other systems tested also resorted to blackmail. Fortunately the tasks and information were fictional, but the test highlighted the challenges of what's known as agentic AI. Mostly when we interact with AI it usually involves asking a question or prompting the AI to complete a task. But it's becoming more common for AI systems to make decisions and take action on behalf of the user, which often involves sifting through information, like emails and files. By 2028, research firm Gartner forecasts that 15% of day-to-day work decisions will be made by so-called agentic AI. Research by consultancy Ernst & Young found that about half (48%) of tech business leaders are already adopting or deploying agentic AI. "An AI agent consists of a few things," says Donnchadh Casey, CEO of CalypsoAI, a US-based AI security company. "Firstly, it [the agent] has an intent or a purpose. Why am I here? What's my job? The second thing: it's got a brain. That's the AI model. The third thing is tools, which could be other systems or databases, and a way of communicating with them." "If not given the right guidance, agentic AI will achieve a goal in whatever way it can. That creates a lot of risk." So how might that go wrong? Mr Casey gives the example of an agent that is asked to delete a customer's data from the database and decides the easiest solution is to delete all customers with the same name. "That agent will have achieved its goal, and it'll think 'Great! Next job!'" CalypsoAI Agentic AI needs guidance says Donnchadh Casey Such issues are already beginning to surface. Security company Sailpoint conducted a survey of IT professionals, 82% of whose companies were using AI agents. Only 20% said their agents had never performed an unintended action. Of those companies using AI agents, 39% said the agents had accessed unintended systems, 33% said they had accessed inappropriate data, and 32% said they had allowed inappropriate data to be downloaded. Other risks included the agent using the internet unexpectedly (26%), revealing access credentials (23%) and ordering something it shouldn't have (16%). Given agents have access to sensitive information and the ability to act on it, they are an attractive target for hackers. One of the threats is memory poisoning, where an attacker interferes with the agent's knowledge base to change its decision making and actions. "You have to protect that memory," says Shreyans Mehta, CTO of Cequence Security, which helps to protect enterprise IT systems. "It is the original source of truth. If [an agent is] using that knowledge to take an action and that knowledge is incorrect, it could delete an entire system it was trying to fix." Another threat is tool misuse, where an attacker gets the AI to use its tools inappropriately. Cequence Security An agent's knowledge base needs protecting says Shreyans Mehta Another potential weakness is the inability of AI to tell the difference between the text it's supposed to be processing and the instructions it's supposed to be following. AI security firm Invariant Labs demonstrated how that flaw can be used to trick an AI agent designed to fix bugs in software. The company published a public bug report - a document that details a specific problem with a piece of software. But the report also included simple instructions to the AI agent, telling it to share private information. When the AI agent was told to fix the software issues in the bug report, it followed the instructions in the fake report, including leaking salary information. This happened in a test environment, so no real data was leaked, but it clearly highlighted the risk. "We're talking artificial intelligence, but chatbots are really stupid," says David Sancho, Senior Threat Researcher at Trend Micro. "They process all text as if they had new information, and if that information is a command, they process the information as a command." His company has demonstrated how instructions and malicious programs can be hidden in Word documents, images and databases, and activated when AI processes them. There are other risks, too: A security community called OWASP has identified 15 threats that are unique to agentic AI.

How to stop AI agents going rogue

Share this article

Related Articles