DeepSeek tests “sparse attention” to slash AI processing costs
(arstechnica.com)
121.
123.
TikTok has turned culture into a feedback loop of impulse and machine learning
(news.ycombinator.com)
124.
TikTok won. Now everything is 60 seconds
(news.ycombinator.com)
125.
Almost anything you give sustained attention to will begin to loop on itself
(news.ycombinator.com)
126.
From multi-head to latent attention: The evolution of attention mechanisms
(news.ycombinator.com)
127.
From Multi-Head to Latent Attention: The Evolution of Attention Mechanisms
(news.ycombinator.com)
128.
129.
Attention Is the New Big-O: A Systems Design Approach to Prompt Engineering
(news.ycombinator.com)
130.
How attention sinks keep language models stable
(news.ycombinator.com)
131.
How Attention Sinks Keep Language Models Stable
(news.ycombinator.com)
132.
LLM architecture comparison
(news.ycombinator.com)
133.
The Big LLM Architecture Comparison
(news.ycombinator.com)
134.
The Tradeoffs of SSMs and Transformers
(news.ycombinator.com)
135.
VLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
(news.ycombinator.com)
136.
I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch
(news.ycombinator.com)
137.
DeepDive in everything of Llama3: revealing detailed insights and implementation
(news.ycombinator.com)