OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, thanks to some clever optimizations, they can run locally (but more about this later).
This is the first time since GPT-2 that OpenAI has shared a large, fully open-weight model. Earlier GPT models showed how the transformer architecture scales. The 2022 ChatGPT release then made these models mainstream by demonstrating concrete usefulness for writing and knowledge (and later coding) tasks. Now they have shared some long-awaited weight model, and the architecture has some interesting details.
I spent the past few days reading through the code and technical reports to summarize the most interesting details. (Just days after, OpenAI also announced GPT-5, which I will briefly discuss in the context of the gpt-oss models at the end of this article.)
Below is a quick preview of what the article covers. For easier navigation, I recommend using the Table of Contents on the left of on the article page.
Model architecture comparisons with GPT-2
MXFP4 optimization to fit gpt-oss models onto single GPUs
Width versus depth trade-offs (gpt-oss vs Qwen3)
Attention bias and sinks
Benchmarks and comparisons with GPT-5
I hope you find it informative!
... continue reading