Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter
(news.ycombinator.com)
1.
2.
Nvidia DGX Spark and Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0
(news.ycombinator.com)