Strengths and limitations of diffusion language models
Published on: 2025-06-26 20:10:09
Google recently released Gemini Diffusion, which is impressing everyone with its speed. Supposedly they even had to slow down the demo so people could see what was happening. What’s special about diffusion models that makes text generation so much faster? Should every text model be a diffusion model, going forward?
I previously wrote a simple explainer of diffusion models here. If you don’t have any intuitions about how diffusion models are different, I suggest starting with that. This post will go into more detail about how those differences affect performance and quality in model outputs.
Why diffusion models are fast
The biggest difference between diffusion models and traditional autoregressive models (like 4o, Claude, and all current transformer-based models) is that diffusion models generate the entire output at each step. For an output like “abcd”, a autoregressive architecture will generate token-by-token: “a”, “ab”, “abc”, and finally “abcd”. A diffusion model will generate
... Read full article.