Demand for data center CPUs has surged, and AI agents are responsible – why the CPU to GPU ratio is more important than ever for hyperscalers

The AI revolution that shows no signs of stopping appears at times to have echoes of the gold rush. Whisper networks spread quickly through communities about new scarce commodities, and suddenly there’s a surge of interest as people snap up resources. For most of the ChatGPT era, you’ve struggled to get hold of a GPU for neither love nor money, with Nvidia practically able to manage its own waitlist, so great is the demand.

Much of the media’s attention – and plenty of investment – has been focused on the dash to grab as many GPUs as possible; most recently, memory has become a focal point.

But in recent weeks and months, there’s been a focus on ensuring that people have CPUs to match. For decades, the CPU has been the anonymous workhorse of the hardware stack, running operating systems, scheduling workloads, and keeping everything ticking over, rarely grabbing headlines unless there’s a supply crunch or a generational leap in performance.

Latest Videos From Watch full video here:

Suddenly, it’s being talked about in the same breath as scarce-as-gold GPUs. What’s going on?

“AI deployment at scale has forced organizations to look at the infrastructure underneath the hype,” said Jason Beckett, chief technology officer in Europe, the Middle East and Africa at Hitachi Vantara, in comments to Tom’s Hardware Premium. As Beckett points out, while most of the attention is focused on GPUs because they run the AI models, the CPUs are vital because they handle “everything else”.

And as agentic AI becomes the norm, there’s a greater need for that CPU backbone to keep things running properly. “Always-on, multi-step reasoning systems don't create brief orchestration bursts around GPU workloads,” said Beckett. “They demand high-core-count CPUs running at sustained loads, continuously. The infrastructure requirement was always structural. It's just now unavoidable.”

Readjusting ratios

When data centers were previously being specced to deliver AI training and inference in the early days of the generative AI revolution, those building them accounted for a gargantuan bias in favor of GPUs. Chatbot conversations required between four and eight GPUs to every single CPU required, because the parallel equations required to meet user requests were GPU-inference heavy.

But as the main use case of AI changes from chatbots to agents, the requirements have also altered. A slight delay for in-depth inference while an AI model ‘thinks’ was seen as an acceptable interface choice. But as agentic AI requires rapid responses and the smooth coordination of tool calls and much more, latency can be a killer. Bolstering CPU counts can help avoid any problems that can quickly spin out into something more significant, breaking the entire agentic stack.

... continue reading