Tech News
← Back to articles

Nvidia’s ‘AI Factory’ narrative faces reality check as inference wars expose 70% margins

read original related products more articles

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

The gloves came off at Tuesday at VB Transform 2025 as alternative chip makers directly challenged Nvidia’s dominance narrative during a panel about inference, exposing a fundamental contradiction: How can AI inference be a commoditized “factory” and command 70% gross margins?

Jonathan Ross, CEO of Groq, didn’t mince words when discussing Nvidia’s carefully crafted messaging. “AI factory is just a marketing way to make AI sound less scary,” Ross said during the panel. Sean Lie, CTO of Cerebras, a competitor, was equally direct: “I don’t think Nvidia minds having all of the service providers fighting it out for every last penny while they’re sitting there comfortable with 70 points.”

Hundreds of billions in infrastructure investment and the future architecture of enterprise AI are at stake. For CISOs and AI leaders currently locked in weekly negotiations with OpenAI and other providers for more capacity, the panel exposed uncomfortable truths about why their AI initiatives keep hitting roadblocks.

The capacity crisis no one talks about

“Anyone who’s actually a big user of these gen AI models knows that you can go to OpenAI, or whoever it is, and they won’t actually be able to serve you enough tokens,” explained Dylan Patel, founder of SemiAnalysis. There are weekly meetings between some of the biggest AI users and their model providers to try to persuade them to allocate more capacity. Then there’s weekly meetings between those model providers and their hardware providers.”

Panel participants also pointed to the token shortage as exposing a fundamental flaw in the factory analogy. Traditional manufacturing responds to demand signals by adding capacity. However, when enterprises require 10 times more inference capacity, they discover that the supply chain can’t flex. GPUs require two-year lead times. Data centers need permits and power agreements. The infrastructure wasn’t built for exponential scaling, forcing providers to ration access through API limits.

According to Patel, Anthropic jumped from $2 billion to $3 billion in ARR in just six months. Cursor went from essentially zero to $500 million ARR. OpenAI crossed $10 billion. Yet enterprises still can’t get the tokens they need.

Why ‘Factory’ thinking breaks AI economics

Jensen Huang’s “AI factory” concept implies standardization, commoditization and efficiency gains that drive down costs. But the panel revealed three fundamental ways this metaphor breaks down:

... continue reading