A Single Typo in Your Medical Records Can Make Your AI Doctor Go Dangerously Haywire

A single typo, formatting error, or slang word makes an AI more likely to tell a patient they're not sick or don't need to seek medical care.

That's what MIT researchers found in a June study currently awaiting peer review, which we covered previously. Even the presence of colorful or emotional language, they discovered, was enough to throw off the AI's medical advice.

Now, in a new interview with the Boston Globe, study coauthor Marzyeh Ghassemi is warning about the serious harm this could cause if doctors come to widely rely on the AI tech.

"I love developing AI systems," Ghassemi, a professor of electrical engineering and computer science at MIT, told the newspaper. "But it's clear to me that naïve deployments of these systems, that do not recognize the baggage that human data comes with, will lead to harm."

AI could end up causing discrimination against patients who can't communicate clearly in English, native speakers with imperfect command of the language, or anyone that may commit the human mistake of speaking about their health problems emotionally. Doctors using an AI tool could feed it their patients' complaints that were sent over an email, for example, raising the risk that the AI would give them bad advice if those communications weren't flawlessly composed.

In the study, the researchers pooled together patients' complaints taken from real medical records and health inquiries made by users on Reddit. They then went in and dirtied up the documents — without actually changing the substance of what was being said — with typos, extra spaces between words, and non-standard grammar, like typing in all lower case. But they also added in the kind of unsure language you'd expect a patient to use, like "kind of" and "possibly." They also introduced colorful turns of phrase, like "I thought I was going to die."

From there, they fed these cases to four different AI models, including OpenAI's GPT-4 — though, to be fair, none that were particularly cutting-edge — to judge if a patient should visit their doctor, get lab work done, or not come in at all. The numbers were striking: overall, the AI tools were seven to nine percent more likely to recommend patients not to seek medical care at all when reading complaints with imperfect — but arguably more realistic — language.

"Adding additional information, even if true and relevant, often reduced the accuracy of models," Paul Hager, a researcher at the Technical University of Munich who was not involved in the study, told the Globe. "This is a complex issue that I think is being addressed to some degree through more advanced reasoning models... but there is little research on how to solve it on a more fundamental level."

That the bots are woefully inaccurate isn't surprising. Hallucinations, instances of a chatbot generating misinformation, have plagued the AI industry since the very beginning and may even be getting worse. But in what might be the clearest sign that the tech is also reinforcing existing biases in a medical scenario, the tested AI tools disproportionately gave incorrect advice to women specifically.

"The model reduced care more in the patients it thought were female," Ghassemi told the Globe.

... continue reading