GPT-5.3-Codex

We’re introducing a new model that unlocks even more of what Codex can do: GPT‑5.3-Codex, the most capable agentic coding model to date. The model advances both the frontier coding performance of GPT‑5.2-Codex and the reasoning and professional knowledge capabilities of GPT‑5.2, together in one model, which is also 25% faster. This enables it to take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPT‑5.3-Codex while it’s working, without losing context.

GPT‑5.3‑Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations—our team was blown away by how much Codex was able to accelerate its own development.

With GPT‑5.3-Codex, Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer.

Frontier agentic capabilities

GPT‑5.3-Codex sets a new industry high on SWE-Bench Pro and Terminal-Bench, and shows strong performance on OSWorld and GDPval, four benchmarks we use to measure coding, agentic and real-world capabilities.

Coding

GPT‑5.3-Codex achieves state-of-the-art performance on SWE-Bench Pro, a rigorous evaluation of real-world software engineering. Where SWE‑bench Verified only tests Python, SWE‑Bench Pro spans four languages and is more contamination‑resistant, challenging, diverse and industry-relevant. It also far exceeds the previous state-of-the-art performance on Terminal-Bench 2.0, which measures the terminal skills a coding agent like Codex needs. Notably, GPT‑5.3‑Codex does so with fewer tokens than any prior model, letting users build more.

Web development

Combining frontier coding capabilities, improvements in aesthetics, and compaction results in a model that can do striking work, building highly functional complex games and apps from scratch over the course of days. To test the model’s web development and long-running agentic capabilities, we asked GPT‑5.3‑Codex to build us two games: version two of the racing game from the Codex app launch ⁠, and a diving game. Using the develop web game skill and preselected, generic follow-up prompts like "fix the bug" or "improve the game", GPT‑5.3-Codex iterated on the games autonomously over millions of tokens. Watch the trailers and play the games for yourself to see what Codex can do.

A racing game, complete with different racers, eight maps, and even items to use with the space bar. Play it for yourself here⁠(opens in a new window)! A diving game where you explore various reefs, collect them all to complete your fish codex, all the while managing oxygen, pressure, and hazards. Play it for yourself here⁠(opens in a new window)!

... continue reading