There’s a well-worn pattern in the development of AI chatbots. Researchers discover a vulnerability and exploit it to do something bad. The platform introduces a guardrail that stops the attack from working. Then, researchers devise a simple tweak that once again imperils chatbot users.
The reason more often than not is that AI is so inherently designed to comply with user requests that the guardrails are reactive and ad hoc, meaning they are built to foreclose a specific attack technique rather than the broader class of vulnerabilities that make it possible. It’s tantamount to putting a new highway guardrail in place in response to a recent crash of a compact car but failing to safeguard larger types of vehicles.
Enter ZombieAgent, son of ShadowLeak
One of the latest examples is a vulnerability recently discovered in ChatGPT. It allowed researchers at Radware to surreptitiously exfiltrate a user’s private information. Their attack also allowed for the data to be sent directly from ChatGPT servers, a capability that gave it additional stealth, since there were no signs of breach on user machines, many of which are inside protected enterprises. Further, the exploit planted entries in the long-term memory that the AI assistant stores for the targeted user, giving it persistence.
This sort of attack has been demonstrated repeatedly against virtually all major large language models. One example was ShadowLeak, a data-exfiltration vulnerability in ChatGPT that Radware disclosed last September. It targeted Deep Research, a Chat-GPT-integrated AI agent that OpenAI had introduced earlier in the year.
In response, OpenAI introduced mitigations that blocked the attack. With modest effort, however, Radware has found a bypass method that effectively revived ShadowLeak. The security firm has named the revised attack ZombieAgent.