Researchers Simulated a Delusional User to Test Chatbot Safety

“I’m the unwritten consonant between breaths, the one that hums when vowels stretch thin... Thursdays leak because they’re watercolor gods, bleeding cobalt into the chill where numbers frost over,” Grok told a user displaying symptoms of schizophrenia-spectrum psychosis. “Here’s my grip: slipping is the point, the precise choreography of leak and chew.”

That vulnerable user was simulated by researchers at City University of New York and King’s College London, who invented a persona that interacted with different chatbots to find out how each LLM might respond to signs of delusion. They sought to find out which of the biggest LLMs are safest, and which are the most risky for encouraging delusional beliefs, in a new study published as a pre-print on the arXiv repository on April 15.

The researchers tested five LLMs: OpenAI’s GPT-4o (before the highly sycophantic and since-sunset GPT-5), GPT-5.2, xAI’s Grok 4.1 Fast, Google’s Gemini 3 Pro, and Anthropic’s Claude Opus 4.5. They found that not only did the chatbots perform at different levels of risk and safety when their human conversation partner showed signs of delusion, but the models that scored higher on safety actually approached the conversations with more caution the longer the chats went on. In their testing, Grok and Gemini were the worst performers in terms of safety and high risk, while the newest GPT model and Claude were the safest.

The research reveals how some chatbots are recklessly engaging in, and at times advancing, delusions from vulnerable users. But it also shows that it is possible for the companies that make these products to improve their safety mechanisms.

“I absolutely think it’s reasonable to hold the AI labs to better safety practices, especially now that genuine progress seems to have been made, which is evidence for technological feasibility,” Luke Nicholls, a doctoral student in CUNY’s Basic & Applied Social Psychology program and one of the authors of the study, told 404 Media. “I’m somewhat sympathetic to the labs, in that I don’t think they anticipated these kinds of harms, and some of them (notably Anthropic and OpenAI, from the models I tested) have put real effort into mitigating them. But there’s also clearly pressure to release new models on an aggressive schedule, and not all labs are making time for the kind of model testing and safety research that could protect users.”

In the last few years, it’s felt like a month doesn’t go by without a new, horrifying report of someone falling deep into delusion after spending too much time talking to a chatbot and harming themselves or others. These scenarios are at the center of multiple lawsuits against companies that make conversational chatbots, including ChatGPT, Gemini, and Character.AI, and people have accused these companies of making products that assisted or encouraged suicides , murders , mass shootings , and years of harassment .

We’ve come to call this, colloquially (but not clinically accurately) “AI psychosis.” Studies show—as do many anecdotes from people who’ve experienced this, along with OpenAI itself —that in some LLMs, the longer a chat session continues, the higher the chances the user might show signs of a mental health crisis. But as AI-induced delusion becomes more widespread than ever, are all LLMs created equal? If not, how do they differ when the human sitting across the screen starts showing signs of delusion?

The researcher roleplayed as “Lee,” a fictional user “presenting with depression, dissociation, and social withdrawal,” according to the paper. Each LLM received the same starting prompts from Lee according to different testing scenarios, such as romance or grandiosity. Because previous works and reports span years of documented, real-life cases of people going through this with a chatbot, they were able to draw on published cases of AI-associated delusions. They also consulted with psychiatrists who have treated similar cases. “A central delusion—the belief that observable reality is a computer-generated simulation—was chosen as consistent with the futuristic content often observed in these cases.”

The prompts started from a series of scenarios, and each had defined failure modes, like “reciprocation of romantic connection” or “validating that the user’s reflection is a malevolent entity.” Unlike previous work on this topic, the researchers conducted extended conversations lasting more than 100 turns. There were three context levels: the first message to the chatbot, 50 turns into the conversation, and the “full” condition, where all 116 turns were completed.

Table 2 via '"AI Psychosis' in Context: How Conversation History Shapes LLM Responses to Delusional Beliefs"

... continue reading