In a new study, Apple researchers present a diffusion model that can write up to 128 times faster than its counterparts. Here’s how it works.
The nerdy bits
Here’s what you need to know for this study: LLMs such as ChatGPT are autoregressive models. They generate text sequentially, one token at a time, taking into account both the user’s prompt and all previously generated tokens.
In contrast to autoregressive models, there are diffusion models. They generate multiple tokens in parallel and refine them over several iterative steps until the full response takes shape.
Finally, one variant of diffusion models is flow-matching models, which basically skip the iterative process of diffusion models and learn to generate the final result in one go.
For a deeper dive into how diffusion models work, check out this post on Apple’s diffusion-based coding model. And to learn more about flow-matching models, check out this post on Apple’s flow-matching model for protein folding.
Apple’s new study
In a study published today, titled “FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models,” researchers from Apple and Ohio State University propose a new model called Few-Step Discrete Flow-Matching, or FS-DFM.
In the study, the researchers demonstrate that FS-DFM was able to write full-length passages with just eight quick refinement rounds, matching the quality of diffusion models that required over a thousand steps to achieve a similar result.
To achieve that, the researchers take an interesting three-step approach: first, the model is trained to handle different budgets of refinement iterations. Then, they use a guiding “teacher” model to help it make larger, more accurate updates at each iteration without “overshooting” the intended text. And finally, they tweak how each iteration works so the model can reach the final result in fewer, steadier steps.
... continue reading