KV Cache Is Becoming the Memory Hierarchy of Inference
(news.ycombinator.com)
1.
2.
How much linear memory access is enough?
(news.ycombinator.com)
3.
How Much Linear Memory Access Is Enough?
(news.ycombinator.com)