Skip to content
Tech News
← Back to articles

Has the hunt for AI compute uncovered the next Cerebras?

read original get Cerebras Wafer-Scale Engine → more articles
Why This Matters

The article highlights the ongoing search for specialized AI chips that can efficiently handle inference tasks, a critical phase where models generate responses. As demand for AI processing surges, companies like General Compute are exploring alternatives to traditional GPUs, investing in innovative chips from firms like SambaNova to meet performance needs and accelerate deployment in data centers. This shift signifies a pivotal moment in the AI hardware industry, potentially transforming how AI services are delivered to consumers and businesses alike.

Key Takeaways

The raging demand for computers to run AI models has only accelerated, but there are two major obstacles that anyone in the business needs to overcome: getting the right chips, and getting them into data centers where they can start generating revenue.

General Compute, a new inference neocloud — a company that rents out AI processing power, specializing in the phase when models are running and responding to users rather than being trained — has answers to those questions that illuminate where the AI ecosystem is headed. Those answers helped it raise a $15 million seed round at a $60 million post-money valuation, led by FUSE VC with participation from Carya Venture Partners and Village Global Ventures.

First, what is the right chip? The demand for GPUs has gone through the roof, but it’s becoming conventional wisdom that they aren’t the best-suited chips for running AI models once they have been trained. The phase of AI where a model is actively generating responses has different computational requirements than training, and a new class of chips is being designed specifically for it. Nvidia’s $20 billion Groq transaction in December and Cerebras’ $57 billion IPO last week point the way.

With capacity strained at both those companies, the co-founders of General Compute, CEO Finn Puklowski and CTO Jason Goodison, found another option. They’re turning to specialized chips built by SambaNova, an Intel-backed chipmaker focused on inference that has fallen a bit out of the Silicon Valley conversation.

That may change when SambaNova releases its new chips this year. The architecture is more flexible and uses more memory to store context during inference calculations, and SambaNova claims that it outperforms not just GPUs but also other specialized chips built by the likes of Groq or Cerebras. Puklowski says the new chips will generate 600 to 700 tokens per second, versus about 250 tokens per second for GPUs.

General Compute has $300 million of the company’s SN50 chips on order and says it will be the first neocloud deploying them.

These chips also help solve the second big problem — where to put them — for General Compute: They are air-cooled, not water-cooled, and consume less power, so they can be installed in existing data center facilities without new infrastructure investments.

Puklowski is pursuing colocation deals — arrangements where General Compute installs its hardware in someone else’s facility — not just with data center providers, but also with crypto miners looking to repurpose their infrastructure as the cost of producing a bitcoin has often exceeded its price.

General Compute launched its cloud offering last week, claiming it is already the fastest at running MiniMax 2.7, a powerful open-source LLM.

Joe Hasselmann is a venture investor who got in on the ground floor of the inference boom when he invested in Groq in 2021. This year, he launched a new fund, Evercrest Capital Partners, focused on the AI space, and made General Compute his first investment. Hassleman sees in SambaNova’s partnership with General Compute parallels to Coreweave’s relationship with Nvidia — and to the pairing of Groq’s chip-making with its former cloud offering.

... continue reading