LLMs greatly improved physicians' diagnostic accuracy.Credit: Rizwan Tabassum/AFP via Getty
Large language models (LLMs) can pass postgraduate medical examinations and help clinicians to make diagnoses, at least in controlled benchmarking tests. But are they useful in real-world settings, which have too few physicians to check the answers as well as long patient lists and limited resources?
Two studies published in Nature Health on 6 February suggest that they are up to the task. The work reveals that cheap-to-use LLMs can boost diagnostic success rates, even outperforming trained clinicians, in health-care settings in Rwanda1 and Pakistan2.
In Rwanda, chatbot answers outscored those of local clinicians across every metric assessed. And in Pakistan, physicians using LLMs to aid their diagnosis achieved a mean diagnostic reasoning score of 71%, versus 43% for those using conventional resources.
“The papers highlight how LLMs might be able to support clinicians in lower- and middle-income countries to improve the level of care,” says Caroline Green, director of research at the Institute for Ethics in AI at the University of Oxford, UK.
Real-world complexity
In the Rwanda study, researchers tested whether LLMs could give accurate clinical information to patients in low-resource health systems across four districts. A common problem there is that there are too few doctors and nurses to see all patients, so most people are seen and triaged by community workers with little training says study co-author Bilal Mateen, chief AI Officer at PATH, a global non-profit organization in London that is dedicated to health equity.
Mateen’s team asked about 100 community health workers to compile a list of more than 5,600 clinical questions they tend to receive from patients.
The researchers compared the responses generated by five LLMs to roughly 500 of these questions against answers from trained local clinicians. Grading the responses on a 5-point scale revealed that all the LLMs outperformed local clinicians across all 11 metrics, which included alignment with established medical consensus, understanding the question and the likelihood of the response leading to harm. The team also demonstrated that the LLMs could answer roughly 100 questions in Kinyarwanda, the national language of Rwanda.
Medical AI can transform medicine — but only if we carefully track the data it touches
... continue reading