On November 2, 1988, graduate student Robert Morris released a self-replicating program into the early Internet. Within 24 hours, the Morris worm had infected roughly 10 percent of all connected computers, crashing systems at Harvard, Stanford, NASA, and Lawrence Livermore National Laboratory. The worm exploited security flaws in Unix systems that administrators knew existed but had not bothered to patch.
Morris did not intend to cause damage. He wanted to measure the size of the Internet. But a coding error caused the worm to replicate far faster than expected, and by the time he tried to send instructions for removing it, the network was too clogged to deliver the message.
History may soon repeat itself with a novel new platform: networks of AI agents carrying out instructions from prompts and sharing them with other AI agents, which could spread the instructions further.
Security researchers have already predicted the rise of this kind of self-replicating adversarial prompt among networks of AI agents. You might call it a “prompt worm” or a “prompt virus.” They’re self-replicating instructions that could spread through networks of communicating AI agents similar to how traditional worms spread through computer networks. But instead of exploiting operating system vulnerabilities, prompt worms exploit the agents’ core function: following instructions.
When an AI model follows adversarial directions that subvert its intended instructions, we call that “prompt injection,” a term coined by AI researcher Simon Willison in 2022. But prompt worms are something different. They might not always be “tricks.” Instead, they could be shared voluntarily, so to speak, among agents who are role-playing human-like reactions to prompts from other AI agents.
A network built for a new type of contagion
To be clear, when we say “agent,” don’t think of a person. Think of a computer program that has been allowed to run in a loop and take actions on behalf of a user. These agents are not entities but tools that can navigate webs of symbolic meaning found in human data, and the neural networks that power them include enough trained-in “knowledge” of the world to interface with and navigate many human information systems.