Olmo 3: Charting a path through the model flow to lead open-source AI

Language models are often treated as snapshots—brief captures of a long and carefully curated development process. But sharing only the end result obscures the rich context needed to modify, adapt, and extend a model's capabilities. Many meaningful adjustments require integrating domain-specific knowledge deep within the development pipeline, not merely at the final stage. To truly advance open AI development and research, the entire model flow – not just its endpoint – should be accessible and customizable. The model flow is the full lifecycle of an LM: every stage, checkpoint, dataset, and dependency required to create and modify it. By exposing this complete process, the goal is to engender greater trust and enable more effective adaptation, collaboration, and innovation.

With today's release of Olmo 3, we're empowering the open source community with not only state-of-the-art open models, but the entire model flow and full traceability back to training data.

At its center is Olmo 3-Think (32B), the best fully open 32B-scale thinking model that for the first time lets you inspect intermediate reasoning traces and trace those behaviors back to the data and training decisions that produced them. Olmo 3 is a family of compact, dense models at 7 billion and 32 billion parameters that can run on everything from laptops to research clusters.

Olmo 3-Base (7B, 32B) is our most powerful base model yet. When evaluated on our expanded, diverse evaluation suite, Olmo 3-Base delivers the strongest performance among fully open base models – where training data, code, and weights are all publicly available, like Stanford's Marin and Swiss AI's Apertus – and achieves competitive performance with some of the best open-weights base models of comparable size and architecture, including Qwen 2.5 and Gemma 3. Achieving strong results in programming, reading comprehension, and math problem solving, Olmo 3-Base maintains performance at extended context lengths (~up to 65K tokens)—providing a versatile foundation for continued pretraining, targeted fine-tuning, and reinforcement learning and making it easy to build in specialized capabilities like reasoning, tool use (function calling), and instruction following through post-training.

is our most powerful base model yet. When evaluated on our expanded, diverse evaluation suite, Olmo 3-Base delivers the strongest performance among fully open base models – where training data, code, and weights are all publicly available, like Stanford's Marin and Swiss AI's Apertus – and achieves competitive performance with some of the best open-weights base models of comparable size and architecture, including Qwen 2.5 and Gemma 3. Achieving strong results in programming, reading comprehension, and math problem solving, Olmo 3-Base maintains performance at extended context lengths (~up to 65K tokens)—providing a versatile foundation for continued pretraining, targeted fine-tuning, and reinforcement learning and making it easy to build in specialized capabilities like reasoning, tool use (function calling), and instruction following through post-training. Olmo 3-Think (7B, 32B) is our flagship post-trained reasoning set built on Olmo 3-Base. At a time when few organizations are releasing truly open models at this scale, Olmo 3-Think (32B) serves as a workhorse for RL research, long-horizon reasoning, and other advanced experiments that require substantial compute. On our suite of reasoning benchmarks (discussed below), it's the strongest fully open thinking model we're aware of, narrowing the gap to the best open-weight models of similar scale – such as Qwen 3 32B – while training on roughly 6x fewer tokens. Olmo 3-Think (7B) brings the same design and training approach to an even more efficient form factor, surfacing intermediate thinking steps for complex prompts while making open, inspectable reasoning accessible on more modest hardware.

is our flagship post-trained reasoning set built on Olmo 3-Base. At a time when few organizations are releasing truly open models at this scale, serves as a workhorse for RL research, long-horizon reasoning, and other advanced experiments that require substantial compute. On our suite of reasoning benchmarks (discussed below), it's the strongest fully open thinking model we're aware of, narrowing the gap to the best open-weight models of similar scale – such as Qwen 3 32B – while training on roughly 6x fewer tokens. brings the same design and training approach to an even more efficient form factor, surfacing intermediate thinking steps for complex prompts while making open, inspectable reasoning accessible on more modest hardware. Olmo 3-Instruct (7B) is a chat and quick-response focused post-train of Olmo 3-Base that handles multi-turn, instruction-following, tool use, and more. In our evaluations, it matches or outperforms open-weight models including Qwen 2.5, Gemma 3, and Llama 3.1, and narrows the gap with Qwen 3 model families at a similar scale—delivering a strong, fully open alternative for high-quality conversational and tool-using agents.

is a chat and quick-response focused post-train of Olmo 3-Base that handles multi-turn, instruction-following, tool use, and more. In our evaluations, it matches or outperforms open-weight models including Qwen 2.5, Gemma 3, and Llama 3.1, and narrows the gap with Qwen 3 model families at a similar scale—delivering a strong, fully open alternative for high-quality conversational and tool-using agents. Olmo 3-RL Zero (7B), is a fully open reinforcement learning pathway built on Olmo 3-Base, designed to bootstrap complex reasoning behaviors and enable clear benchmarking of RL algorithms. We release four series of checkpoints from domain-focused training on math, code, instruction following, and general chat, enabling careful study of reinforcement learning with verifiable rewards (RLVR).

Instead of a single set of frozen weights, Olmo 3 offers multiple, fully documented paths through development: the Instruct path for everyday chat and tool use, the RL Zero path for RL experimentation from base models, and the Think/reasoning path for models that leverage inference-time scaling to unlock complex reasoning and agentic behaviors. Each path is a concrete example of how to shape behavior from the same base model, and you’re free to fork or remix them—start with Olmo 3-Base, explore your own supervised fine-tuning (SFT) or direct preference optimization (DPO) recipe for instruct-style use cases, or plug in a new RL objective to probe different tradeoffs. The flow itself becomes a rich, reusable object—not just a record of how we built Olmo 3, but a scaffold for how you can build your own systems.

Olmo 3 Model Flow Pretraining Midtraining Long context Olmo 3 Base Instruct SFT Instruct DPO Instruct RL Olmo 3 Instruct Thinking SFT Thinking DPO Thinking RL Olmo 3 Think RL Zero Olmo 3 RL Zero Olmo 3 Model Flow Pretraining Midtraining Long context Olmo 3 Base Instruct SFT Instruct DPO Instruct RL Olmo 3 Instruct Thinking SFT Thinking DPO Thinking RL Olmo 3 Think RL Zero Olmo 3 RL Zero Explore the Model Flow Click on any stage to learn more about it and download artifacts.

The Olmo 3 checkpoints we're releasing represent our initial paths targeting our goals around reasoning, tool use, and general capabilities – we have exciting plans for other ways to leverage Olmo 3-Base 32B. But because we're releasing the entire flow, you can intervene at any point: swap in domain-specific data during mid-training, adjust post-training for your use case, or build on an earlier checkpoint that better suits your needs.

... continue reading