Advanced Quantization Algorithm for LLMs
(news.ycombinator.com)
1.
2.
We got 207 tok/s with Qwen3.5-27B on an RTX 3090
(news.ycombinator.com)
3.
MDST Engine: run GGUF models in the browser with WebGPU/WASM
(news.ycombinator.com)
4.
Rust implementation of Mistral's Voxtral Mini 4B Realtime runs in your browser
(news.ycombinator.com)
5.
Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete
(news.ycombinator.com)
Today's top topics:
openai
apple
google
google health
chatgpt
samsung
nvidia
anthropic
android authority
spacex