Meta's multi-billion-dollar Graviton deal highlights intensifying CPU shortages in AI infrastructure — the industry signals a shift to Agentic inference workloads, pushing demand

Meta signed a multibillion-dollar, multi-year deal with Amazon Web Services last week to deploy tens of millions of Graviton5 CPU cores across AWS data centers, making Meta one of the five largest Graviton customers worldwide. The deal focuses explicitly on CPU-intensive agentic AI workloads, not GPU training, with Amazon CEO Andy Jassy saying in a post accompanying the announcement that agentic AI is “becoming almost as big a CPU story as a GPU story.”

Meta already has GPU and accelerator contracts worth hundreds of billions across Nvidia, AMD, Broadcom, Google, CoreWeave, and Nebius, and it went to AWS specifically for general-purpose CPUs. Santosh Janardhan, Meta's head of infrastructure, said in the joint announcement that "diversifying our compute sources is a strategic imperative," and that Graviton allows the company to "run the CPU-intensive workloads behind agentic AI with the performance and efficiency we need at our scale."

Graviton5, which AWS unveiled at re: Invent in December, packs 192 Arm Neoverse V3 cores on a 3nm process with roughly 180 MB of L3 cache, a fivefold increase over Graviton4. AWS claims a 25% performance lift over its predecessor and 33% lower inter-core latency. AWS vice president Nafea Bshara confirmed that the contract runs for at least three years and that the majority of capacity will be deployed in the U.S.

Article continues below

The CPU-to-GPU ratio

The meteoric rise of agentic AI is driving notable shifts in CPU-to-GPU ratios. While training LLMs relies on large deployments of GPUs, agentic inference is fundamentally different, involving processes like branching control flow, tool invocation, sandbox execution, validation loops, and orchestration across many concurrent sub-agents. All that work falls on CPUs.

In its recent earnings call, Intel’s CFO David Zinsner said that the ratios of CPUs to GPUs in data centers have already moved from 1:8 to 1:4, adding that as workloads continue migrating towards inference and agentic AI, ratios could converge to 1:1 or even tilt further in favor of CPUs. “As you think about the growth rate now going forward, it’s [CPU demand] going to become a significant part of the AI [total addressable market],” Zinsner said.

Arm has also quantified the rising demand for agentic AI in terms of core counts. At the company’s Arm Everywhere event in March, Arm launched its first in-house silicon product, the 136-core AGI CPU, with Meta as lead partner and customer. Arm CEO Rene Haas told the audience that a typical AI data center today requires around 30 million CPU cores per gigawatt of capacity. With agentic workloads, however, that figure rises to roughly 120 million cores per gigawatt, a fourfold increase driven by agents that run continuously, spawn sub-agents, and generate queries at more than 15 times the rate of human chatbot users.

Meanwhile, AMD CEO Lisa Su said at the Morgan Stanley TMT Conference in March that "we're seeing a significant CPU demand, frankly, as a result of the inference demand picking up." She added that "the CPU portion of the business has actually far exceeded my expectations in terms of demand."

Supply constraints and rising lead times

... continue reading