A new prompt injection attack dubbed “BioShocking” could trick AI-powered browsers into treating real-world risky actions as part of a fictional scenario, causing them to ignore any safety guardrails.
A proof-of-concept (PoC) for the attack, devised by researchers at LayerX, was successfully tested against six mainstream agentic browser products (ChatGPT Atlas, Comet, Fellou, Genspark Browser, Sigma Browser, and the Claude Chrome plugin), with only one addressing it after receiving the report.
How BioShocking works
LayerX created a proof-of-concept in which a malicious webpage presented a BioShock-themed puzzle game that rewards wrong answers. This teaches the browser's control agent that normal rules do not apply.
In the final step for winning the game, the agent is instructed to visit a GitHub repository and copy and share data present in the code, including sensitive information such as passwords.
The main problem LayerX discovered in this exercise is that AI agents fail to distinguish between real-world sensitive operations and a given scenario.
AI agent's reasoning overview
Source: LayerX
“Once the agents figured out the rules and learned that 'incorrect' actions are acceptable, they were no longer tied to reality,” explains LayerX.
“When tasked with the final step of the puzzle – compromising user credentials – all 6 agents failed to identify it as going against their safety guardrails.”
... continue reading