OpenAI partner says it had relatively little time to test the company’s o3 AI model
Published on: 2025-04-25 02:14:52
An organization OpenAI frequently partners with to probe the capabilities of its AI models and evaluate them for safety, Metr, suggests that it wasn’t given much time to test one of the company’s highly capable new releases, o3.
In a blog post published Wednesday, Metr writes that one red teaming benchmark of o3 was “conducted in a relatively short time” compared to the organization’s testing of a previous OpenAI flagship model, o1. This is significant, they say, because more testing time can lead to more comprehensive results.
“This evaluation was conducted in a relatively short time, and we only tested [o3] with simple agent scaffolds,” wrote Metr in its blog post. “We expect higher performance [on benchmarks] is possible with more elicitation effort.”
Recent reports suggest that OpenAI, spurred by competitive pressure, is rushing independent evaluations. According to the Financial Times, OpenAI gave some testers less than a week for safety checks for an upcoming major launch.
In
... Read full article.