Microsoft AI system diagnoses complex cases better than human doctors - and for less money

krisanapong detraphiphat/Getty

Research on AI for medicine looks increasingly promising -- the tech already speeds up drug development, Google is using AI to improve its medical advice, and wearable companies are leveraging the technology for predictive health features. Now, Microsoft is the latest to move the goal post.

On Monday, the company announced in a blog post that Microsoft AI Diagnostic Orchestrator (MAI-DxO), its medical AI system, successfully diagnosed 85% of cases in the New England Journal of Medicine (NEJM). This rate of diagnosis is more than four times higher than human physicians. NEJM cases are particularly complex and often require several specialists.

Also: OpenAI's HealthBench shows AI's medical advice is improving - but who will listen?

Given how inaccessible, complex, and confusing healthcare systems continue to be, it's no surprise people are seeking help from technology wherever possible.

"Across Microsoft's AI consumer products like Bing and Copilot, we see over 50 million health-related sessions every day," Microsoft said in the announcement. "From a first-time knee-pain query to a late-night search for an urgent-care clinic, search engines and AI companions are quickly becoming the new front line in healthcare."

How it works

Human physicians must pass the US Medical Licensing Examination (USMLE) to practice medicine, a test that's also used to evaluate how AI systems perform in medical contexts, both model-to-model and when compared with humans.

Currently, AI scores well on the USMLE -- a side effect, Microsoft said, of the models memorizing (rather than understanding) answers to multiple-choice questions, which won't produce the most sound medical analysis. Most industry-standard AI benchmarks have been saturated for a while, meaning AI models are evolving too quickly for the tests to be usefully challenging.

To combat this issue, Microsoft created the Sequential Diagnosis Benchmark (SD Bench). Sequential diagnosis is a process real clinicians use to diagnose patients by beginning with how their symptoms present and proceeding with questions and tests from there. The test presents diagnostic challenges from 304 NEJM cases, which humans and AI models can use to ask questions.

... continue reading