Skip to content
Tech News
clear
Topics: Today This Week This Month This Year
1.
GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz (news.ycombinator.com)
2.
Can I Buy Your KV Cache? (news.ycombinator.com)
3.
KVarN: Native vLLM backend for KV-cache quantization by Huawei (news.ycombinator.com)
4.
Users say Gemini starts forgetting long before it’s supposed to (androidauthority.com)
5.
Autoregressive next token prediction and KV Cache in transformers (news.ycombinator.com)
6.
KV Cache Is Becoming the Memory Hierarchy of Inference (news.ycombinator.com)
7.
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction (news.ycombinator.com)
8.
From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem (news.ycombinator.com)
9.
Google's TurboQuant reduces AI LLM cache memory capacity requirements by at least six times — up to 8x performance boost on Nvidia H100 GPUs, compresses KV caches to 3 bits with no accuracy loss (tomshardware.com)
10.
Nvidia says it can shrink LLM memory 20x without changing model weights (venturebeat.com)
Today's top topics: apple google anthropic meta android android authority openai amazon microsoft samsung
View all today's topics →