AI is everywhere at CES 2026, and Nvidia GPUs are at the center of the expanding AI universe. Today, during his CES keynote, CEO Jensen Huang shared his plans for how the company will remain at the forefront of the AI revolution as the technology reaches far beyond chatbots into robotics, autonomous vehicles, and the broader physical world.
First up, Huang officially launched Vera Rubin, Nvidia's next-gen AI data center rack-scale architecture. Rubin is the result of what the company calls "extreme co-design" across six types of chips: the Vera CPU, the Rubin GPU, the NVLink 6 switch, the ConnectX-9 SuperNIC, the BlueField-4 data processing unit, and the Spectrum-6 Ethernet switch. Those building blocks all come together to create the Vera Rubin NVL72 rack.
Demand for AI compute is insatiable, and each Rubin GPU promises much more of it for this generation: 50 PFLOPS of inference performance with the NVFP4 data type, 5x that of Blackwell GB200, and 35 PFLOPS of NVFP4 training performance, 3.5x that of Blackwell. To feed those compute resources, each Rubin GPU package has eight stacks of HBM4 memory delivering 288GB of capacity and 22 TB/s of bandwidth.
Per-GPU compute is just one building block in the AI data center. As leading large language models have shifted from dense architectures that activate every parameter to produce a given output token to mixture-of-experts (MoE) architectures that only activate a portion of the available parameters per token, it has become possible to scale up those models relatively efficiently. However, communication among those experts within models requires vast amounts of inter-node bandwidth.
Vera Rubin introduces NVLink 6 for scale-up networking, which boosts per-GPU fabric bandwidth to 3.6 TB/s (bi-directional). Each NVLink 6 switch boasts 28 TB/s of bandwidth, and each Vera Rubin NVL72 rack has nine of these switches for 260 TB/s of total scale-up bandwidth.
The Nvidia Vera CPU implements 88 custom Olympus Arm cores with what Nvidia calls "spatial multi-threading," for up to 176 threads in flight. The NVLink C2C interconnect used to coherently connect the Vera CPU to the Rubin GPUs has doubled in bandwidth, to 1.8 TB/s. Each Vera CPU can address up to 1.5 TB of SOCAMM LPDDR5X memory with up to 1.2 TB/s of memory bandwidth.
To scale out Vera Rubin NVL72 racks into DGX SuperPods of eight racks each, Nvidia is introducing a pair of Spectrum-X Ethernet switches with co-packaged optics, all built up from its Spectrum-6 chip. Each Spectrum-6 chip offers 102.4 Tb/s of bandwidth, and Nvidia is offering it in two switches.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors
The SN688 boasts 409.6 Tb/s of bandwidth for 512 ports of 800G Ethernet or 2048 ports of 200G. The SN6810 offers 102.4 Tb/s of bandwidth that can be channeled into 128 ports of 800G or 512 ports of 200G Ethernet. Both of these switches are liquid-cooled, and Nvidia claims they're more power-efficient, more reliable, and offer better uptime, presumably against hardware that lacks silicon photonics.
As context windows grow to millions of tokens, Nvidia says that operations on the key-value cache that holds the history of interactions with an AI model become the bottleneck for inference performance. To break through that bottleneck, Nvidia is using its next-gen BlueField 4 DPUs to create what it calls a new tier of memory: the Inference Context Memory Storage Platform.
... continue reading