Skip to content
Tech News
← Back to articles

Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team

read original more articles
Why This Matters

EAGLE 3.1 marks a significant advancement in speculative decoding by enhancing robustness, efficiency, and deployability, addressing previous fragility issues like attention drift. These improvements enable more reliable performance across diverse deployment scenarios, benefiting both researchers and industry practitioners working with large language models. The update underscores ongoing efforts to optimize AI model stability and scalability for real-world applications.

Key Takeaways

The EAGLE series — including EAGLE 1, EAGLE 2, and EAGLE 3 — has become one of the most widely adopted and practically deployed families of speculative decoding algorithms across both research and production systems.

Today, the EAGLE team, vLLM team, and TorchSpec team are excited to jointly introduce EAGLE 3.1 — a major step forward in speculative decoding robustness, efficiency, and deployability.

EAGLE 3.1 Innovations

While speculative decoding performs well in controlled settings, performance often degrades under different chat templates, long-context inputs, or out-of-distribution system prompts.

The EAGLE team traced this fragility to a phenomenon we call attention drift — as speculation depth increases, the drafter gradually shifts attention away from sink tokens and toward its own generated tokens.

We identified two underlying issues. First, the fused input representation becomes increasingly imbalanced as higher-layer hidden states dominate the drafter input. Second, hidden-state magnitude grows across speculation steps due to the unnormalized residual path. Together, these effects make the drafter progressively less stable at deeper speculation depths.

Figure 1: EAGLE 3 vs. EAGLE 3.1 architecture comparison. EAGLE 3.1 adds FC normalization after each target hidden state and feeds post-norm hidden states into the next decoding step.

To address this issue, EAGLE 3.1 introduces two key architectural improvements:

FC normalization after each target hidden state and before the FC layer

Feeding post-norm hidden states into the next decoding step

... continue reading