First thoughts on o3 pro

As “leaked”, OpenAI cut o3 pricing by 80% today (from $10/$40 per mtok to $2/$8 - matching GPT 4.1 pricing!!) to set the stage of the launch of o3-pro ($20/$80, supporting an unverified community theory that the -pro variants are 10x base model calls with majority voting as referenced in their papers and in our Chai episode). o3-pro reports a 64% win rate vs o3 on human testers and does marginally better on 4/4 reliability benchmarks, but as sama noticed, the actual experience expands when you test it DIFFERENTLY…

I’ve had early access to o3 pro for the past week. below are my (early) thoughts:

God is hungry for context.

We’re in the era of task-specific models. On one hand, we have “normal” models like 3.5 Sonnet and 4o—the ones we talk to like friends, who help us with our writing, and answer our day-to-day queries. On the other, we have gigantic, slow, expensive, IQ-maxxing reasoning models that we go to for deep analysis (they’re great at criticism), one-shotting complex problems, and pushing the edge of pure intelligence.

If you follow me on Twitter, you know I've had a journey with the o-reasoning models. My first impression of o1/o1-pro was quite negative. But as I gritted my teeth through the first weeks, propelled by other people's raving reviews, I realized that I was, in fact, using it wrong. I wrote up all my thoughts, got ratio’ed by @sama, and quote-tweeted by @gdb.

The key, I discovered, was to not chat with it. Instead, treat it like a report generator. Give it context, give it a goal, and let it rip. And that's exactly how I use o3 today.

But therein lines the problem with evaluating o3 pro.

It’s smarter. much smarter.

But in order to see that, you need to give it a lot more context. and I’m running out of context.

There was no simple test or question i could ask it that blew me away.

... continue reading