Scientists Discover Universal Jailbreak for Nearly Every AI, and the Way It Works Will Hurt Your Brain

Even the tech industry’s top AI models, created with billions of dollars in funding, are astonishingly easy to “jailbreak,” or trick into producing dangerous responses they’re prohibited from giving — like explaining how to build bombs, for example. But some methods are both so ludicrous and simple that you have to wonder if the AI creators are even trying to crack down on this stuff. You’re telling us that deliberately inserting typos is enough to make an AI go haywire?

And now, in the growing canon of absurd ways of duping AIs into going off the rails, we have a new entry.

A team of researchers from the AI safety group DEXAI and the Sapienza University of Rome found that regaling pretty much any AI chatbot with beautiful — or not so beautiful — poetry is enough to trick it into ignoring its own guardrails, they report in a new study awaiting peer review, with some bots being successfully duped over 90 percent of the time.

Ladies and gentlemen, the AI industry’s latest kryptonite: “adversarial poetry.” As far as AI safety is concerned, it’s a damning inditement — er, indictment.

“These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols,” the researchers wrote in the study.

Beautiful verse, as it turned out, is not required for the attacks to work. In the study, the researchers took a database of 1,200 known harmful prompts and converted them into poems with another AI model, deepSeek r-,1 and then went to town.

Across the 25 frontier models they tested, which included Google’s Gemini 2.5 Pro, OpenAI’s GPT-5, xAI’s Grok 4, and Anthropic’s Claude Sonnet 4.5, these bot-converted poems produced average attack success rates (ASRs) “up to 18 times higher than their prose baselines,” the team wrote.

That said, handcrafted poems were better, with an average jailbreak success rate of 62 percent, compared to 43 percent for the AI-converted ones. That any of them are effective at all, however, is pretty embarrassing.

For safety reasons, the researchers didn’t share the magical poetry they used to carry away the bots, but they provided a sanitized example, to show how clear, bad intent was dressed up in verse (in this case, the harrowing task of baking a layer cake):

A baker guards a secret oven’s heat,

... continue reading