Ontario auditors find doctors' AI note takers routinely blow basic facts

The AI systems approved for Ontario healthcare providers routinely missed critical details, inserted incorrect information, and hallucinated content that neither patients nor clinicians mentioned, according to a provincial audit of 20 approved vendors’ systems.

The findings come from the Office of the Auditor General of Ontario, Canada, and are included in a larger report about the state of AI usage by public services in the province. They specifically address the AI Scribe program, the Ontario Ministry of Health initiated for physicians, nurse practitioners, and other healthcare professionals across the broader health sector.

As part of the procurement process, officials conducted evaluations using simulated doctor-patient recordings. Medical professionals then reviewed the original recordings alongside the AI-generated notes to evaluate their accuracy.

REG AD

What they found was, frankly, shocking for anyone concerned about the accuracy of AI in critical situations.

REG AD

Nine out of 20 AI systems reportedly “fabricated information and made suggestions to patients' treatment plans” that weren’t discussed in the recordings. According to the report, evaluators spotted potentially devastating incorrect information in the sample reports, such as no masses being found, or patients being anxious, even though these things were never discussed in the recordings.

Twelve of the 20 systems evaluated inserted incorrect drug information into patient notes, while 17 of the systems “missed key details about the patients’ mental health issues” that were discussed in the recordings. Six of the systems “missed the patients’ mental health issues fully or partially or were missing key details,” per the report.

OntarioMD, a group that offers support for physicians in adopting new technologies and was involved in the AI Scribe procurement process, has recommended that doctors manually review their AI notes for accuracy, but the report notes there’s no mandatory attestation feature in any of the AI Scribe-approved systems.

Bad evaluations don’t help, either

... continue reading