Two different tricks for fast LLM inference
(news.ycombinator.com)
1.
2.
Speed up responses with fast mode
(news.ycombinator.com)