How attention sinks keep language models stable
(news.ycombinator.com)
1.
2.
How Attention Sinks Keep Language Models Stable
(news.ycombinator.com)