Tech News
← Back to articles

China claims domestically-designed 14nm logic chips can rival 4nm Nvidia silicon — architecture leverages 3D hybrid bonding techniques for claimed 120 TFLOPS of power

read original related products more articles

At the ICC Global CEO Summit in Beijing, China Semiconductor Industry Association vice chairman Wei Shaojun claimed that a new domestically designed AI processor using mature 14nm logic and 18nm DRAM nodes can match the performance of Nvidia’s current 4nm chips. The architecture, which leverages 3D hybrid bonding and software-defined near-memory computing, is intended to counter China’s reliance on the Nvidia CUDA ecosystem.

Wei pitched the design as a potentially disruptive shift away from U.S. dependency, calling it central to China's AI strategy, but fell short of disclosing any specific technical details, hinting that he would “leave some suspense” for now, DigiTimes reports.

What he did describe was 14nm logic bonded directly to 18nm DRAM to drastically increase memory bandwidth and reduce compute latency. He said the system's power efficiency reaches 2 TFLOPS per watt, with a claimed total throughput of 120 TFLOPS, which is much higher than Nvidia’s A100 GPUs. He argued that by placing memory and logic in the same package, the chip avoids the “memory wall” that hinders large-scale GPU deployments.

Wei added that the chip is part of a fully domestic supply chain effort and will be formally disclosed in detail later this year. The ultimate goal, he said, is to sidestep Western supply chain constraints, cut costs, and decouple China’s AI development from U.S. vendors at both the hardware and software levels.

Memory wall and node stagnation

Wei's comments put forward a direct challenge to the dominant logic of semiconductor development over the last decade. Where U.S. and Taiwanese chipmakers have focused on smaller transistors — Apple's M3 at 3nm and Nvidia’s Hopper at 4nm — China’s researchers are now pitching advanced packaging and system architecture as a way to restore competitiveness using older manufacturing nodes.

The architecture described involves stacking a logic chip built on a 14nm process directly onto or alongside 18nm DRAM, using 3D hybrid bonding. This technique differs from traditional package-on-package or chiplet interconnects, in that it allows extremely dense, low-latency, high-bandwidth connections between dies. Unlike wire bonding or organic interposers, hybrid bonding directly fuses copper-to-copper contacts between wafers or die surfaces at the micron scale, supporting much higher interconnect density and thermal performance.

(Image credit: Nvidia)

According to Wei, this layout enables near-memory compute, with logic operations executed in close proximity to memory blocks. That reduces the energy and latency cost of frequent memory fetches, often the limiting factor in AI workloads. He said software-defined logic further boosts efficiency by allowing compute units to be dynamically mapped and configured for AI-specific workloads.

He also suggested that a theoretical performance of 120 TFLOPS could be reached with a total power draw of 2 TFLOPS per watt, which would place the architecture well above the energy efficiency of Nvidia’s A100 and in the territory of Hopper-class or Blackwell-class chips. It also implies a significant advantage over CPU-bound systems like Intel Xeon, which Wei said remain less efficient for large model training.

... continue reading