watch now
A new spending discipline is taking hold inside corporate America, as CFOs and boards start cracking down on inefficient artificial intelligence spending. The change has the potential to reshape the AI trade. For the past two years, the playbook has been to default to the most powerful AI model and direct all queries through it, regardless of complexity. Now, with AI bills running far ahead of budgets, companies are starting to ask whether every task actually needs the frontier. Two leaders at the center of the AI buildout told CNBC this week that a solution is emerging: model routing.
What is model routing?
Routing is a tool that matches the job to the model, sending hard problems to the expensive frontier models and easy ones to cheaper, faster alternatives. Scott Wu, CEO of Cognition, which makes the coding agent Devin, said the gains on routine work are enormous. For a lot of the boilerplate work, he said, companies can get five to 10 times better cost efficiency using models that are still good enough for the task. Most companies today aren't routing at all. Glean CEO Arvind Jain has estimated that roughly 95% of enterprise AI usage is still running on the most expensive frontier models, even for tasks that cheaper alternatives could easily handle. Wu gave the example of asking a model to name the third U.S. president. Each one, no matter how expensive, will tell you it was Thomas Jefferson.
Arvind Jain, CEO of Glean, on SaaS Monster stage during day one of Web Summit 2022 at the Altice Arena in Lisbon, Portugal, on Nov. 2, 2022. Harry Murphy | Sportsfile | Getty Images
The pressure behind the shift is a cost curve that has surprised even the biggest tech companies. Jeetu Patel, chief product officer at Cisco , laid out the math. At roughly $200 of token usage per employee per week, that's about $10,000 a year per person. With 90,000 employees, a company is looking at $900 million annually. Patel said Cisco came in well over its own budget and has had to adjust, with 30,000 engineers now building products written largely with AI. Cisco has reallocated resources, prioritizing tokens over other spending.
Vendors under pressure