Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon
(news.ycombinator.com)
1.
2.
Run a 1T parameter model on a 32gb Mac by streaming tensors from NVMe
(news.ycombinator.com)