From multi-head to latent attention: The evolution of attention mechanisms
From Multi-Head to Latent Attention: The Evolution of Attention Mechanisms Vinithavn 7 min read · 15 hours ago 15 hours ago -- Listen Share Press enter or click to view image in full size What is attention? In any autoregressive model, the prediction of the future tokens is based on some preceding context. However, not all the tokens within this context equally contribute to the prediction, because some tokens might be more relevant than others. The attention mechanism addresses this by allow