GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz
(news.ycombinator.com)
1.
2.
Can I Buy Your KV Cache?
(news.ycombinator.com)
3.
KVarN: Native vLLM backend for KV-cache quantization by Huawei
(news.ycombinator.com)
4.
Users say Gemini starts forgetting long before it’s supposed to
(androidauthority.com)
5.
Autoregressive next token prediction and KV Cache in transformers
(news.ycombinator.com)
6.
KV Cache Is Becoming the Memory Hierarchy of Inference
(news.ycombinator.com)
7.
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
(news.ycombinator.com)
8.
From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem
(news.ycombinator.com)
9.
10.
Nvidia says it can shrink LLM memory 20x without changing model weights
(venturebeat.com)