Tech News
← Back to articles

Huawei Ascend NPU roadmap examined — company targets 4 ZettaFLOPS FP4 performance by 2028, amid manufacturing constraints

read original related products more articles

In addition to announcing its first AI cluster with 1 FP4 ZettaFLOPS performance, Huawei also revealed a detailed roadmap of its upcoming Ascend neural processing units (NPUs) that accelerate AI workloads at the Huawei Connect 2025 event.

The company does not have access to TSMC’s leading-edge process technologies or high-end HBM4 and GDDR7 memory from global leaders. So to boost the performance of its Ascend processors, it will need to rely on a new architecture and new types of memory, kicking off with the Ascend 950-series and onwards. Huawei expects its new NPUs to enable multi-ZettaFLOPS performance toward the end of the decade.

When it comes to features, Huawei’s Ascend 910-series AI accelerators have barely changed in years: The latest dual-chiplet Ascend 910C offers higher performance and optimized manufacturability compared to the original Ascend 910 from 2019. The unit uses a SIMD architecture and supports conventional formats, such as FP32, HF32, FP16, BF16, and INT8, which are good enough for AI training, but are ‘heavy’ for AI inference by modern standards.

The new Ascend 910C delivers up to 800 TFLOPS of FP16 performance (around the same as an Nvidia H100), carries 128 GB of HBM, and features 3.2 TB/s of memory bandwidth. While the performance of the Ascend 910C would be competitive in 2023, by modern standards, it's significantly behind Nvidia’s Blackwell-based GPUs. That said, Huawei needs both new architecture and new processors.

So, the company is cooking up a lineup of NPUs — the Ascend 950PR and 950DT, Ascend 960, and Ascend 970 — which use an all-new instruction set architecture and support modern data formats required for next-generation AI workloads. The new AI accelerators will also use Huawei's proprietary HBM-like memory technologies: the cheaper HIBL 1.0 and higher-performance HiZQ 2.0.

Swipe to scroll horizontally Huawei Ascend roadmap NPU Targeted Release Architecture FP8 Performance FP4 Perf Memory Memory Bandwidth Interconnect Bandwidth Supported Formats Ascend 910C 2025 Q1 SIMD – – 128 GB 3.2 TB/s 784 GB/s FP32, HF32, FP16, BF16, INT8 Ascend 950PR 2026 Q1 SIMD + SIMT 1 PFLOPS 2 PFLOPS 128 GB 1.6 TB/s 2.0 TB/s FP32, HF32, FP16, BF16, FP8, MXFP8, HiF8, MXFP4 Ascend 950DT 2026 Q4 SIMD + SIMT 1 PFLOPS 2 PFLOPS 144 GB 4.0 TB/s 2.0 TB/s FP32, HF32, FP16, BF16, FP8, MXFP8, HiF8, MXFP4 Ascend 960 2027 Q4 SIMD + SIMT 2 PFLOPS 4 PFLOPS 288 GB 9.6 TB/s 2.2 TB/s FP32, HF32, FP16, BF16, FP8, MXFP8, HiF8, MXFP4, HiF4 Ascend 970 2028 Q4 SIMD + SIMT 4 PFLOPS 8 PFLOPS 288 GB 14.4 TB/s 4.0 TB/s FP32, HF32, FP16, BF16, FP8, MXFP8, HiF8, MXFP4, HiF4

The Ascend 950

The next major step in the Huawei Ascend roadmap is the Ascend 950 series, comprising two variants: the Ascend 950PR, optimized for prefill and recommendation stages, and the Ascend 950DT, optimized for decoding and training.

(Image credit: Huawei)

Both Ascend 950-series products use the same silicon, based on the company's new SIMD+SIMT architecture that weds vector-based processing and thread-level parallelism to maximize performance. They feature a GPU-like memory subsystem with reduced DRAM access granularity from 512 to 128 bytes. This reduces wasted bandwidth and improves memory efficiency. All Ascend 950-series processors will add support for FP8, MXFP8, HiF8, and MXFP4 data formats (on top of what is already offered by the Ascend 910C) to offer the right balance of performance and precision.

... continue reading