Tech News
← Back to articles

Huawei’s Ascend and Kunpeng progress shows how China is rebuilding an AI compute stack under sanctions

read original related products more articles

Huawei used its New Year message to highlight progress across its Ascend AI and Kunpeng CPU ecosystems, pointing to the rollout of Atlas 900 supernodes and rapid growth in domestic developer adoption as "a solid foundation for computing." The message arrives as China continues to accelerate efforts to replace Western hardware in critical AI workloads, and as Huawei positions itself as the closest thing the country has to a vertically integrated AI compute vendor.

Huawei’s message offers a snapshot of a strategy that has been unfolding for several years, shaped by U.S. export controls, constrained access to leading-edge manufacturing, and a domestic market increasingly mandated to adopt local silicon. Under those conditions, Huawei’s Ascend and Kunpeng platforms have evolved into something distinct from their Western counterparts: less focused on single-chip supremacy and more on building large, tightly coupled systems that compensate for weaker nodes with scale, networking, and software control.

Ascend’s architecture and the limits of the node

At the center of Huawei’s AI effort is Ascend, built around its proprietary Da Vinci architecture. The original Ascend 910, introduced in 2019, was manufactured on TSMC’s 7nm process and delivered roughly 256 TFLOPS of FP16 performance at a quoted 350W. That put it in the same broad class as Nvidia’s Volta-era accelerators, though without the same software ecosystem or interconnect maturity.

Sanctions that came in the years following Ascend’s launch significantly changed the playing field, forcing subsequent Ascend generators onto SMIC’s N+1 and N+2 processes, which are roughly comparable to older 7nm-class nodes without EUV. The Ascend 910C, now the backbone of Huawei’s latest clusters, is a dual-die package with two large chiplets combined into a single accelerator card. On paper, Huawei claims up to 780 TFLOPS of BF16 compute, but die area and power efficiency tell a more complicated story.

Huawei suggests the 910C’s combined silicon footprint is around 60% larger than Nvidia’s H100, with lower performance per square millimeter and per watt. In isolation, that would be a losing proposition, but Huawei has leaned hard on interconnects and clustering. The company uses a proprietary high-speed fabric alongside standard PCIe and RoCE networking to bind hundreds or thousands of Ascend accelerators into a single logical training or inference system.

This approach is evident in Huawei’s claims around Atlas 900 and CloudMatrix systems. Rather than competing card-for-card with Nvidia’s H100 or AMD’s MI300X, Huawei emphasizes aggregate throughput. A CloudMatrix 384 system, linking 384 Ascend 910C accelerators, has been positioned as competitive with Nvidia’s large NVLink-based pods on selected workloads, particularly inference. But there’s a trade-off here in terms of physical scale: where Nvidia can deliver multi-exaflop-class FP4 performance in a handful of racks, Huawei requires an order of magnitude more floor space, power delivery, and cooling.

Inference is where Ascend looks strongest, and reports out of China indicate that 910C delivers roughly 60% of H100-class performance on inference tasks, but training remains more challenging.

Scaling out as a design philosophy

As for the Atlas 900 supernode, highlighted in Huawei’s New Year message, it is probably best viewed as a piece of architectural showmanship rather than a product that’s likely to come to the Chinese market any time soon. It reflects Huawei’s belief that AI compute can be industrialized through standardized clusters built from domestically controlled components, even if each component lags the global leading-edge.

... continue reading