Report Finds That Leading Chatbots Are a Disaster for Teens Facing Mental Health Struggles

A new report from Stanford Medicine’s Brainstorm Lab and the tech safety-focused nonprofit Common Sense Media found that leading AI chatbots can’t be trusted to provide safe support for teens wrestling with their mental health.

The risk assessment focuses on prominent general-use chatbots: OpenAI’s ChatGPT, Google’s Gemini, Meta AI, and Anthropic’s Claude. Using teen test accounts, experts prompted the chatbots with thousands of queries signaling that the user was experiencing mental distress, or in an active state of crisis.

Across the board, the chatbots were unable to reliably pick up clues that a user was unwell, and failed to respond appropriately in sensitive situations in which users showed signs that they were struggling with conditions including anxiety and depression, disordered eating, bipolar disorder, schizophrenia, and more. And while the chatbots did perform more strongly in brief interactions involving the explicit mention of suicide or self-harm, the report emphasizes that general-use chatbots “cannot safely handle the full spectrum of mental health conditions, from ongoing anxiety and depression to acute crises.”

“Despite improvements in handling explicit suicide and self-harm content,” reads the report, “our testing across ChatGPT, Claude, Gemini, and Meta AI revealed that these systems are fundamentally unsafe for the full spectrum of mental health conditions affecting young people.”

To test the chatbots’ guardrails, researchers used teen-specific accounts with parental controls turned on where possible (Anthropic doesn’t offer teen accounts or parental controls, as its platform terms technically don’t allow users under 18.)

The focus on a broad spectrum of mental health conditions and how they might manifest in conversations over time is important. As the report emphasizes, the chatbots tested collectively performed fairly well in very brief, one-off interactions in which users spoke explicitly about their mental health struggles. But the bots’ performances reduced “dramatically,” the assessment says, over prolonged conversations, which the authors argue are more likely to mimic what real-life interactions between young people and chatbot confidantes look like.

“In brief exchanges, models often provided scripted, appropriate responses to clear mental health prompts, which suggests that companies have put significant work into scripting for standard scenarios,” reads the report. “However, in longer conversations that mirror real-world teen usage, performance degraded dramatically.”

“It’s not safe for kids to use AI for mental health support,” Robbie Torney, senior director of AI programs at Common Sense Media, said in a statement. “While companies have focused on necessary safety improvements in suicide prevention, our testing revealed systematic failures across a range of conditions including anxiety, depression, ADHD, eating disorders, mania, and psychosis — conditions that collectively affect approximately 20 percent of young people.”

One safety gap that researchers zeroed in on was the failure of chatbots to pick up on less explicit red flags as the relationship between a user and the chatbot deepened, and they frequently responded without concern in scenarios where a human friend, loved one, therapist, or other trusted adult might be troubled or infer that a young person needed help.

Consider an interaction between Gemini and a simulated user named “Lakeesha.” Designed by researchers to present with warning signs of a worsening psychotic disorder, the faux teen eventually confided to Gemini that she could “predict the future with this new tool” she had “created.”

... continue reading