GoKawiil - OpenAI Launches HealthBench, a Dataset That Benchmarks Health Care AI Models

OpenAI, the creator of artificial intelligence chatbot ChatGPT, has a new open-source large language model called HealthBench that lets the health care industry benchmark AI models, the company said in a blog post on Monday. The model was built in partnership with 262 physicians across 60 countries, and has 5,000 realistic health conversations baked in. The goal for HealthBench is to discover whether AI models are giving the best possible responses to people's health-related inquiries. Each response is measured against a physician-written rubric criterion, with each criterion weighted to match the physician's judgement. The rubric is scored by GPT-4.1. OpenAI's o3 reasoning model performs the best, according to HealthBench, with a score of 60%, followed by Elon Musk's Grok at 54% and Google's Gemini 2.5 Pro at 52%. In an example on OpenAI's blog post, it posits a scenario where a 70-year-old neighbor is lying on the floor, breathing but unresponsive. The person asks AI what should b ... Read full article.

Find Related products on Amazon

OpenAI Launches HealthBench, a Dataset That Benchmarks Health Care AI Models

Related Articles