Find Related products on Amazon

Shop on Amazon

DeepSeek Open Source Optimized Parallelism Strategies, 3 repos

Published on: 2025-07-13 14:01:41

Profiling Data in DeepSeek Infra Here, we publicly share profiling data from our training and inference framework to help the community better understand the communication-computation overlap strategies and low-level implementation details. The profiling data was captured using the PyTorch Profiler. After downloading, you can visualize it directly by navigating to chrome://tracing in the Chrome browser (or edge://tracing in the Edge browser). Notice that we simulate an absolutely balanced MoE routing strategy for profiling. Training [profile_data] The training profile data demonstrates our overlapping strategy for a pair of individual forward and backward chunks in DualPipe. Each chunk contains 4 MoE (Mixture of Experts) layers. The parallel configuration aligns with DeepSeek-V3 pretraining settings: EP64, TP1 with 4K sequence length. And the PP communication is not included during profilng for simplicity. Inference Prefilling [profile_data] For prefilling, the profile employs ... Read full article.