Design Patterns for Securing LLM Agents Against Prompt Injections

Design Patterns for Securing LLM Agents against Prompt Injections

This new paper by 11 authors from organizations including IBM, Invariant Labs, ETH Zurich, Google and Microsoft is an excellent addition to the literature on prompt injection and LLM security.

In this work, we describe a number of design patterns for LLM agents that significantly mitigate the risk of prompt injections. These design patterns constrain the actions of agents to explicitly prevent them from solving arbitrary tasks. We believe these design patterns offer a valuable trade-off between agent utility and security.

Here’s the full citation: Design Patterns for Securing LLM Agents against Prompt Injections (2025) by Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav Volhejn.

I’m so excited to see papers like this starting to appear. I wrote about Google DeepMind’s Defeating Prompt Injections by Design paper (aka the CaMeL paper) back in April, which was the first paper I’d seen that proposed a credible solution to some of the challenges posed by prompt injection against tool-using LLM systems (often referred to as “agents”).

This new paper provides a robust explanation of prompt injection, then proposes six design patterns to help protect against it, including the pattern proposed by the CaMeL paper.

The scope of the problem

The authors of this paper very clearly understand the scope of the problem:

As long as both agents and their defenses rely on the current class of language models, we believe it is unlikely that general-purpose agents can provide meaningful and reliable safety guarantees. This leads to a more productive question: what kinds of agents can we build today that produce useful work while offering resistance to prompt injection attacks? In this section, we introduce a set of design patterns for LLM agents that aim to mitigate — if not entirely eliminate — the risk of prompt injection attacks. These patterns impose intentional constraints on agents, explicitly limiting their ability to perform arbitrary tasks.

This is a very realistic approach. We don’t have a magic solution to prompt injection, so we need to make trade-offs. The trade-off they make here is “limiting the ability of agents to perform arbitrary tasks”. That’s not a popular trade-off, but it gives this paper a lot of credibility in my eye.

... continue reading