Find Related products on Amazon

Shop on Amazon

OpenAI's newest o3 and o4-mini models excel at coding and math – but hallucinate more often

Published on: 2025-08-23 10:15:00

A hot potato: OpenAI's latest artificial intelligence models, o3 and o4-mini, have set new benchmarks in coding, math, and multimodal reasoning. Yet, despite these advancements, the models are drawing concern for an unexpected and troubling trait: they hallucinate, or fabricate information, at higher rates than their predecessors – a reversal of the trend that has defined AI progress in recent years. Historically, each new generation of OpenAI's models has delivered incremental improvements in factual accuracy, with hallucination rates dropping as the technology matured. However, internal testing and third-party evaluations now reveal that o3 and o4-mini, both classified as "reasoning models," are more prone to making things up than earlier reasoning models such as o1, o1-mini, and o3-mini, as well as the general-purpose GPT-4o, according to a report by TechCrunch. On OpenAI's PersonQA benchmark, which measures a model's ability to answer questions about people accurately, o3 halluci ... Read full article.