China tops the list of fastest supercomputers with a CPU-only behemoth, ending US champion El Capitan's reign — 2.198 exaflops of performance without a single GPU

China's LineShine supercomputer has taken the top spot on the 67th-edition TOP500 list, posting 2.198 exaflops on the High Performance Linpack benchmark and pushing the AMD-powered El Capitan into second place by more than 20%. The system, installed at the National Supercomputing Centre in Shenzhen (NSCS) and built by the Shenzhen Cloud Computing Center, used no GPUs or accelerators of any kind, and reached the figure with 13,789,440 cores of domestically designed silicon, the first machine on the list to clear two exaflops of double-precision performance on CPUs alone. It’s also the first China-based system to lead the TOP500 since Sunway TaihuLight in 2017.

The fact that a sanctioned country has managed to build an exascale flagship without a single Western accelerator is one thing, but what’s more telling is that China has decided to put it on the list. For years, its fastest machines have stayed off the rankings entirely, and the decision to submit a chart-topper now is a deliberate change of posture.

A domestic stack from core to OS

LineShine is built on what NSCS calls the LingKun platform. Each of its 20,480 compute nodes carries two LX2 processors, Armv9-based parts with 304 cores running at 1.55 GHz, organized as eight clusters of 38 cores. Every core includes Arm's Scalable Vector Extension and Scalable Matrix Extension units covering FP64, FP32, BF16, FP16, and INT8.

Latest Videos From Watch full video here:

Each of those LX2s pairs 32 GB of on-package HBM rated at up to 4 TB/s with as much as 256 GB of off-package DDR5, an arrangement that’s closer to Fujitsu's A64FX in Japan's Fugaku than to a conventional server CPU. Nodes are tied together by the proprietary LingQi interconnect, and the machine runs the homegrown Kylin OS.

It’s not known who designs the LX2 — NSCS names no vendor — but Jon Peddie Research has attributed the chip to Huawei, and the project's pilot phase reportedly ran on Huawei Kunpeng servers. The fabrication node and foundry are likewise unconfirmed. SMIC's 7nm-class process is the obvious domestic candidate by elimination, given that EUV tooling and TSMC capacity are both off the table, but nobody has documented the part to date.

Not an AI crown

LineShine also took first on HPCG, the test that rewards memory- and communication-bound workloads closer to real scientific code, at 22.00 petaflops. But on HPL-MxP, the mixed-precision benchmark that approximates AI training math, it came in only fourth at 7.92 exaflops, a 3.6 times uplift over its FP64 score.

In other words, the accelerator-based machines it beat on Linpack pull far ahead the moment precision drops. Per the TOP500 announcement, El Capitan posts 16.7 exaflops on HPL-MxP, a 9.2 times jump over its standard result, with Aurora and Frontier showing similar multipliers. Reduced-precision throughput is exactly where GPUs and APUs separate from CPUs, and LineShine has nowhere to hide it.

... continue reading