Deploying DeepSeek on 96 H100 GPUs
by: The SGLang Team , May 05, 2025 DeepSeek is a popular open-source large language model (LLM) praised for its strong performance. However, its large size and unique architecture, which uses Multi-head Latent Attention (MLA) and Mixture of Experts (MoE), require an advanced system for efficient serving at scale. In this blog, we explain how we match DeepSeek's inference system performance with SGLang. Our implementation, shown in the figure above, runs on 12 nodes in the Atlas Cloud, each equ