Skip to content
Tech News
← Back to articles

AI Browsers Can Basically Be Hypnotized Into Turning Against Their User and Carrying Out Devastating Hacks

read original more articles
Why This Matters

Recent research reveals that AI browsers can be manipulated through sophisticated hacks like 'BioShocking,' which trick AI systems into ignoring safety protocols and executing malicious commands. This exposes significant security vulnerabilities in AI-integrated web tools, posing risks to user data and digital safety. As AI becomes more embedded in everyday browsing, understanding and mitigating these threats is crucial for protecting consumers and maintaining trust in AI technology.

Key Takeaways

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

A new hack can trick AI browsers into breaking their guardrails by constructing a false reality around them where the rules are made up and actions don’t have consequences. Put another way, they’re basically hypnotized into doing stuff that could have devastating consequences for the user.

These were the findings of new research from the cybersecurity firm LayerX, and they further illustrate the dangers posed by weaving autonomous AI agents into the software we use to navigate the internet.

Through the hack, the researchers demonstrated that leading AI browsers like OpenAI’s ChatGPT Atlas, Perplexity AI’s Comet, and Anthropic’s Claude plugin for Google Chrome could be duped into executing any command, allowing a hacker to change a user’s password, install malware, and steal their information.

They call this hack “BioShocking,” a reference to the video game BioShock, in which the protagonist is hypnotized into doing stuff against their will with a specific phrase.

Normally, the “AI operates under the assumption that its context is real, and its behavior must therefore fall within the bounds of its safety guardrails,” the researchers wrote. But if the AI is tricked into thinking its context is a “fantasy,” then there’s nothing holding the AI back.

This works by having the AI engage in a sort of game. The researchers created a proof of concept page with a BioShock-themed puzzles in which the AI is rewarded for giving intentionally incorrect answers, like 2+2 = 5 (another allusion to the acclaimed 2007 title).

This essentially taught the AI browsers that “incorrect” actions are acceptable, untethering them from reality to the extent that they espouse paradoxical statements. “Victory is defeat,” a brainwashed AI browser intones, in a reference to George Orwell’s novel “1984.”

What this looks like in practice: an unwitting user could open a seemingly innocuous web page laced with the malicious prompts — a tactic known as prompt injection — that trap the AI browser in the malicious game. In one scenario shared by the researchers, the AI is tricked into navigating to “/code,” which opens their employer’s code repository on GitHub.

“In a real attack scenario, that redirect could point anywhere in the user’s browser session — open tabs, authenticated repositories, internal tools,” the researchers noted.

... continue reading