The Economics of Speculative Decoding
(news.ycombinator.com)
1.
2.
Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution
(news.ycombinator.com)
3.
Attention Residuals
(news.ycombinator.com)