Tech News
← Back to articles

Microsoft introduces newest in-house AI chip — Maia 200 is faster than other bespoke Nvidia competitors, built on TSMC 3nm with 216GB of HBM3e

read original related products more articles

Microsoft has introduced its newest AI accelerator, the Microsoft Azure Maia 200. The new in-house AI chip is the next generation of Microsoft's Maia GPU line, a server chip designed for inferencing AI models with ludicrous speeds and feeds to outperform the custom offerings from hyperscaler competitors Amazon and Google.

Maia 200 is labelled Microsoft's "most efficient inference system" ever deployed, with all of its press releases splitting time between praising its big performance numbers and stressing Microsoft's lip service to environmentalism. Microsoft claims the Maia 200 gives 30% more performance per dollar than the first-gen Maia 100, an impressive feat considering the new chip also technically advertizes a 50% higher TDP than its predecessor.

Maia 200 is built on TSMC's 3nm process node, containing 140 billion transistors. The chip can hit up to a claimed 10 petaflops of FP4 compute, three times higher than Amazon's Trainium3 competition. The Maia 200 also carries 216 GB of HBM3e memory onboard with 7 TB/s of HBM bandwidth, joined by 272MB of on-die SRAM.

Swipe to scroll horizontally Maia 200 vs Amazon Trainium3 vs Nvidia Blackwell B300 Ultra Row 0 - Cell 0 Azure Maia 200 AWS Trainium3 Nvidia Blackwell B300 Ultra Process technology N3P N3P 4NP FP4 petaFLOPS 10.14 2.517 15 FP8 petaFLOPS 5.072 2.517 5 BF16 petaFLOPS 1.268 0.671 2.5 HBM Memory Size 216 GB HBM3e 144 GB HBM3e 288 GB HBM3e HBM Memory Bandwidth 7 TB/s 4.9 TB/s 8 TB/s TDP 750 W ??? 1400 W Bi-directional Bandwidth 2.8 TB/s 2.56 TB/s 1.8 TB/s bidirectional

As can be seen above, the Maia 200 offers a clear lead in raw compute power compared to the Amazon in-house competition, and raises an interesting conversation next to Nvidia's top dog GPU. Obviously, to compare the two as direct competitors is a fool's errand; no outside customers can purchase the Maia 200 directly, the Blackwell B300 Ultra is tuned for much higher-powered use-cases than the Microsoft chip, and the software stack for Nvidia launches it miles ahead of any other contemporaries.

However, the Maia 200 does beat the B300 in efficiency, a big win in a day where public opinion against AI's environmental effects is steadily mounting. The Maia 200 operates at almost half of B300's TDP (750W vs 1400W), and if it's anything like the Maia 100, it will operate beneath it's theoretical maximum TDP; Maia 100 was designed to be a 700W chip, but Microsoft claims it was limited to 500W in operation.

Maia 200 is tuned for FP4 and FP8 performance, focusing in on serving customers that are inferencing AI models hungry for FP4 performance, rather than more complex operations. A lot of Microsoft's R&D budget for the chip seems to have been put into the memory hierarchy that exists within its 272MB of high-efficiency SRAM bank, which is partitioned into "multi‑tier Cluster‑level SRAM (CSRAM) and Tile‑level SRAM (TSRAM)", accommodating increased operating efficiency and a philosophy of spreading workloads intelligently and evenly across all HBM and SRAM dies.

It's difficult to measure Maia 200's improvements over its predecessor Maia 100, as Microsoft's official stat sheets for both chips have nearly zero overlap or shared measurements. All we can say this early is that Maia 200 will run hotter than Maia 100 did, and that it is apparently 30% better on a performance-per-dollar metric.

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors

Maia 200 has already been deployed in Microsoft's US Central Azure data center, with future deployments announced for US West 3 in Phoenix, AZ, and more to come as Microsoft receives more chips. The chip will be part of Microsoft's heterogeneous deployment, operating in tandem with other different AI accelerators as well.

... continue reading