TL;DR DiffusionGemma writes a whole chunk of text in one go and then keeps polishing it rather than building it word by word.
Google says it can be up to 4x faster, hitting 1,000+ tokens per second on NVIDIA H100 and around 700 on an RTX 5090, thanks to parallel processing.
Output quality is still inferior to Gemma 4, so it’s more of an experimental tool than a finished product.
Google has released DiffusionGemma, an experimental AI model that takes a very different approach to how most chatbots generate text today. Instead of writing one word after another in a strict sequence, it generates a whole block of text at once and then keeps refining it until it becomes readable. The idea is to push for speed and hardware efficiency, even if it means giving up some polish in the final output.
This new AI model is open-sourced under the Apache 2.0 license and is aimed at developers and researchers rather than everyday users. To understand why this matters, it helps to look at how most large language models work. Systems like Google’s Gemma 4 generate text step by step, one token at a time. Each new word depends on what came before it, which makes the process inherently sequential and harder to speed up.
DiffusionGemma, on the other hand, starts with a full canvas of random tokens, essentially noisy, unreadable text, and then repeatedly cleans it up in multiple passes. With each pass, the output becomes more structured and coherent until it settles into a final response. A simple way to picture it is that traditional models write, while DiffusionGemma drafts and edits everything at once.
Don’t want to miss the best from Android Authority? Set us as a favorite source in Google Discover to never miss our latest exclusive reports, expert analysis, and much more.
to never miss our latest exclusive reports, expert analysis, and much more. You can also set us as a preferred source in Google Search by clicking the button below.
... continue reading