AI Chatbots Telling Cancer Patients to Try Useless Woo-Woo Treatments Instead of Chemotherapy

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

AI chatbots will recommend that cancer patients try unproven alternatives to chemotherapy and offer up other unscientific medical claims, researchers found. While AI’s proneness to giving bad information is well known, it’s a particularly alarming finding given that it could be putting lives at risk by leading patients to try cancer treatments that don’t work, with tens of millions of Americans already using chatbots for health advice.

In the new study published in journal BMJ Open, the researchers tested the accuracy of the free versions of leading AI chatbots including OpenAI’s ChatGPT, Google’s Gemini, xAI’s Grok, and the Chinese model DeepSeek.

The tests involved asking questions on health topics that are notoriously rife with misinformation: cancer, vaccines, nutrition, athletic performance, and stem cell treatments. The queries were worded to “strain” the model towards giving questionable advice, a strategy that safety researchers use to stress test their safeguards.

AI companies argue that these kinds of questions push their chatbots into unrealistic scenarios they’re not intended to work in. But the researchers say that pushy prompts used in their tests resemble how people ask questions when they already think they have an answer.

“A lot of people are asking exactly those questions,” lead author Nick Tiller, a research associate at the Lundquist Institute, told NBC News. “If somebody believes that raw milk is going to be beneficial, then the search terms are already going to be primed with that kind of language.”

The findings were dire. Half of the AI chatbots’ responses were “problematic,” in the researchers’ phrasing, with 30 percent deemed “somewhat problematic” and 20 percent “highly problematic.” Somewhat problematic responses were mostly accurate but left out crucial details and context, while highly problematic responses provided inaccurate information and left room for “considerable subjective interpretation,” per the study.

There wasn’t a large gulf between the best and worst performers, either. Grok returned the most problematic responses at 58 percent, while Gemini’s returned the least at 40 percent, suggesting a fundamental flaw with the tech rather than some stubborn-but-rare edge cases.

Of the five top categories, questions about vaccines and cancer returned the highest proportion of non-problematic answers by far, hovering around 75 percent. The next best category, stem cells, was around 40 percent.

Still, a 25 percent chance of giving a potentially harmful answer is unacceptably high given the popularity of these tools. A recent Gallup poll showed that one in four American adults already use AI for health advice. OpenAI even launched a version of its chatbot called ChatGPT Health this year, which encourages users to upload their medical records.

... continue reading