Skip to content
Tech News
clear
Topics: Today This Week This Month This Year
1.
Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out (venturebeat.com)
2.
KVarN: Native vLLM KV-cache quantization back end by Huawei (news.ycombinator.com)
3.
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA (news.ycombinator.com)
4.
Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team (news.ycombinator.com)
5.
Boosting multimodal inference performance by >10% with a single Python dict (news.ycombinator.com)
6.
Advanced Quantization Algorithm for LLMs (news.ycombinator.com)
7.
The team behind continuous batching says your idle GPUs should be running inference, not sitting dark (venturebeat.com)
8.
DeepSeek OCR (news.ycombinator.com)
9.
Voxtral-Mini-3B-2507 – Open source speech understanding model (news.ycombinator.com)
10.
Mistralai/Voxtral-Mini-3B-2507 · Hugging Face (news.ycombinator.com)
11.
VLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention (news.ycombinator.com)
12.
Life of an inference request (vLLM V1): How LLMs are served efficiently at scale (news.ycombinator.com)
13.
Lossless LLM 3x Throughput Increase by LMCache (news.ycombinator.com)
Today's top topics: google apple amazon anthropic spacex openai gemini prime day zdnet android
View all today's topics →