Huawei's Ascend AI chip ecosystem scales up as China pushes for semiconductor independence — however, firm lags behind on efficiency and performance

Huawei’s in-house Ascend processors and their surrounding supplier network are being positioned as the foundation of a national effort to build an independent, fully domestic semiconductor ecosystem in China. That includes everything from high-end AI chips and custom optical networks, through to packaging materials, photoresists, and gas delivery systems.

More than 60 semiconductor companies are now backed by Huawei’s investment arm Hubble, while local partners like Empyrean are advancing design toolchains to support a parallel AI software ecosystem independent of Nvidia and other U.S. vendors, according to reporting by Nikkei Asia and Trendforce,

This growing web of suppliers was showcased at the China Hi-Tech Fair in Shenzhen, where Huawei’s CloudMatrix 384 system, an AI server rack integrating 384 Ascend 910C processors, was positioned as a direct alternative to Nvidia’s GB200 platform Though obvious performance and efficiency trade-offs remain, the system highlights how far Huawei has come since the U.S. first restricted its access to foundry services and IP in 2019.

Competing with Blackwell by scale

The foundation of Huawei’s server strategy is the Ascend 910C, a dual-chiplet accelerator built using stacked HBM2E memory and a DaVinci NPU architecture tailored for AI workloads. The chip delivers up to 780 TFLOPS of dense BF16 compute, with the entire package consuming 350 watts

That trails Nvidia’s Hopper-based H100 or Blackwell-based B200 in both peak throughput and power efficiency, but Huawei offsets the difference by scaling up. The CloudMatrix 384 system, for example, combines twelve racks of Ascend modules with four optical interconnect racks, creating a 384-processor fabric that delivers around 300 PFLOPS in total. The network is entirely optical, with 6,912 pluggable transceivers forming a high-bandwidth, all-to-all topology.

The system draws around 559 kilowatts at peak load, which is nearly four times the power draw of Nvidia’s GB200-based DGX system. But Chinese data centers face fewer regulatory constraints on energy use, and local power costs remain significantly lower than in the U.S. That trade-off, paired with large-scale domestic chip availability, makes the Ascend stack a viable foundation for training large-scale AI models in-country. Huawei’s internal tests claim CloudMatrix outperforms Nvidia H100 platforms on specific model classes, although public benchmarks remain scarce.

The software stack around Ascend is also maturing. Huawei’s CANN programming environment and MindSpore framework support common model architectures through a translation layer that can ingest PyTorch or TensorFlow graphs. While CUDA remains dominant globally, Huawei is planning to open-source more of its toolchain to accelerate local development and draw interest from non-domestic partners where export controls permit.

Building the supply chain from the bottom up

(Image credit: Huawei)

... continue reading