Tech News
← Back to articles

Jensen Huang discusses the economics of inference, power delivery, and more at CES 2026 press Q&A session — 'You sell a chip one time, but when you build software, you maintain it forever'

read original related products more articles

When Nvidia CEO Jensen Huang spoke at Nvidia's CES 2026 keynote, the conversation stretched across Rubin, power delivery, inference economics, open models, and more. Following this, Tom's Hardware had the opportunity to sit down and attend a press Q&A session with Huang himself in Las Vegas, Nevada.

While we can't share the transcript in its entirety, we've drilled down on some of Huang's most important statements from the session itself, with some parts edited for flow and clarity. As a whole, it's recommended that you first familiarize yourself with the topics announced at Nvidia's CES 2026 keynote before diving headfirst into our highlights of the following Q&A session, which will reference parts of what Huang discussed onstage.

NVIDIA Live with CEO Jensen Huang - YouTube Watch On

Huang’s answers during the Q&A helped to clarify how Nvidia is approaching the next phase of large-scale AI deployment, with a consistent emphasis on keeping systems productive once they are installed. Those of you who remember our previous Jensen Huang Computex Q&A will recall the CEO's grand vision for a 50-year plan for deploying AI infrastructure.

Nvidia is designing for continuous inference workloads, constrained power environments, and platforms that must remain useful as models and deployment patterns change. Other comments also include references to SRAM vs HBM deployment at an additional Q&A for analysts. Those priorities in flexibility explain recent architectural choices around serviceability, power smoothing, unified software stacks, and support for open models, and they help to paint a picture of how Nvidia is thinking about scaling AI infrastructure beyond the initial buildout phase we’re seeing unfold right now.

Designing around uptime and failure

The most consistent theme running through the Q&A was Nvidia’s focus on keeping systems productive under real-world conditions, with Huang spending the lion’s share of the time discussing downtime, serviceability, and maintenance. That’s quite evident in how Nvidia is positioning its upcoming Vera Rubin platform.

“Imagine today we have a Grace Blackwell 200 or 300. It has 18 nodes, 72 GPUs, and an NVLink 72 with nine switch trays. If there’s ever an issue with any of the cables or switches, or if the links aren’t as robust as we want them to be, or even if semiconductor fatigue happens over time, you eventually want to replace something,” Huang said.

Rather than parroting performance metrics, Huang repeatedly described the economic impact of racks going offline and the importance of minimizing disruption when components fail. At the scale Nvidia’s customers now operate, failures are inevitable. GPUs fail, interconnects degrade, and power delivery fluctuates. The question Nvidia is trying to answer is how to prevent those failures from cascading into prolonged outages.

“When we replace something today, we literally take the entire rack down. It goes to zero. That one rack, which costs about $3 million, goes from full utilization to zero and stays down until you replace the NVLink or any of the nodes and bring it back up.”

... continue reading