Simon Willison's Lethal Trifecta Talk at the Bay Area AI Security Meetup
In the pirate case there’s no real damage done... but the risks of real damage from prompt injection are constantly increasing as we build more powerful and sensitive systems on top of LLMs. I think this is why we still haven’t seen a successful “digital assistant for your email”, despite enormous demand for this. If we’re going to unleash LLM tools on our email, we need to be very confident that this kind of attack won’t work. My hypothetical digital assistant is called Marvin. What happens i