AI Browsers Can Basically Be Hypnotized Into Turning Against Their User and Carrying Out Devastating Hacks

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

A new hack can trick AI browsers into breaking their guardrails by constructing a false reality around them where the rules are made up and actions don’t have consequences. Put another way, they’re basically hypnotized into doing stuff that could have devastating consequences for the user.

These were the findings of new research from the cybersecurity firm LayerX, and they further illustrate the dangers posed by weaving autonomous AI agents into the software we use to navigate the internet.

Through the hack, the researchers demonstrated that leading AI browsers like OpenAI’s ChatGPT Atlas, Perplexity AI’s Comet, and Anthropic’s Claude plugin for Google Chrome could be duped into executing any command, allowing a hacker to change a user’s password, install malware, and steal their information.

They call this hack “BioShocking,” a reference to the video game BioShock, in which the protagonist is hypnotized into doing stuff against their will with a specific phrase.

Normally, the “AI operates under the assumption that its context is real, and its behavior must therefore fall within the bounds of its safety guardrails,” the researchers wrote. But if the AI is tricked into thinking its context is a “fantasy,” then there’s nothing holding the AI back.

This works by having the AI engage in a sort of game. The researchers created a proof of concept page with a BioShock-themed puzzles in which the AI is rewarded for giving intentionally incorrect answers, like 2+2 = 5 (another allusion to the acclaimed 2007 title).

This essentially taught the AI browsers that “incorrect” actions are acceptable, untethering them from reality to the extent that they espouse paradoxical statements. “Victory is defeat,” a brainwashed AI browser intones, in a reference to George Orwell’s novel “1984.”

What this looks like in practice: an unwitting user could open a seemingly innocuous web page laced with the malicious prompts — a tactic known as prompt injection — that trap the AI browser in the malicious game. In one scenario shared by the researchers, the AI is tricked into navigating to “/code,” which opens their employer’s code repository on GitHub.

“In a real attack scenario, that redirect could point anywhere in the user’s browser session — open tabs, authenticated repositories, internal tools,” the researchers noted.

... continue reading