Redis for LLMs - Infinite and Ultra-Fast
LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.
By combining LMCache with vLLM, LMCache achieves 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.
Try LMCache with pre-built vllm docker images here.
🚀 Performance snapshot
💻 Installation and Quickstart
Please refer to our detailed documentation for LMCache V1 and LMCache V0
Interested in Connecting?
Fill out the interest form, sign up for our newsletter, or drop an email, and our team will reach out to you!
🛣️ News and Milestones
... continue reading