1.
2.
Two different tricks for fast LLM inference
(news.ycombinator.com)
3.
Speed up responses with fast mode
(news.ycombinator.com)