Apple researchers built an AI that tests several ideas in parallel before answering

In a new paper, a team of Apple researchers details a creative framework that improves LLM answers in math reasoning, code generation, and more. Here are the details.

Diffusion and autoregression, united

In a newly-revised study titled LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning, Apple researchers, alongside researchers from the University of California, San Diego, detail an interesting way to improve the quality of answers generated by large language models (LLMs) in certain domains.

In the past, we’ve discussed diffusion models, which generate text by iterating over many tokens in parallel with each pass, in contrast to autoregressive models, which work by calculating and predicting tokens one by one.

Apple has even looked at diffusion models applied to protein folding prediction and coding, which is endlessly interesting.

What LaDiR does, in a nutshell, is combine both approaches: it adopts diffusion during the reasoning process, and then generates the final output autoregressively.

More than that, it actually works with many reasoning paths in parallel, each one running its own diffusion process, with a mechanism that pushes them to explore different possibilities, thus producing a diverse set of candidate answers.

They explain that during inference time, when the model is essentially coming up with what and how it will answer to the user’s prompt, LaDiR generates a series of hidden reasoning blocks, each starting as a random pattern (or, noise) and gradually being refined into a more coherent step.

Once the model determines it has done enough reasoning, it switches to generating the final answer autoregressively, one token at a time.

The key detail is that LaDiR can run several of these reasoning paths in parallel, with a mechanism that encourages it to explore different possibilities to avoid them all converging on the same idea too early, defeating the purpose of the whole thing.

... continue reading