AMD announces MI350P PCIe AI accelerator card with 144GB of HBM3E — roughly 40% faster in FP16 and FP8 theoretical compute compared to Nvidia's H200 NVL competitor

AMD has launched a new member of the MI350-series that comes in a PCIe form factor. The new Instinct MI350P comes with 128 CUs and 144GB of HBM3E memory and is designed to be a drop-in upgrade solution for existing air-cooled servers.

The MI350P comes in a 10.5" dual-slot card with a fanless cooling solution designed around a 600W power envelope (the card is designed to be cooled by chassis fans in a rack-mounted server). However, the card can be configured to run at a lower 450W power target to maintain compatibility with more thermally or power-constrained chassis.

Swipe to scroll horizontally AMD MI350X and MI355X specficaitions Specifications (PEAK THEORETICAL) AMD Instinct MI350P GPU AMD Instinct MI325X GPU AMD INSTINCT MI350X GPU AMD INSTINCT MI350X PLATFORM AMD INSTINCT MI355X GPU AMD INSTINCT MI355X PLATFORM GPUs Instinct MI350P PCIe Instinct MI325X OAM Instinct MI350X OAM 8 x Instinct MI350X OAM Instinct MI355X OAM 8 x Instinct MI355X OAM GPU Architecture CDNA 4 CDNA 3 CDNA 4 CDNA 4 CDNA 4 CDNA 4 Dedicated Memory Size 144 GB HBM3E 256 GB HBM3E 288 GB HBM3E 2.3 TB HBM3E 288 GB HBM3E 2.3 TB HBM3E Memory Bandwidth 4 TB/s 6 TB/s 8 TB/s 8 TB/s per OAM 8 TB/s 8 TB/s per OAM FP64 Performance 36 TFLOPs Row 4 - Cell 2 72 TFLOPs 577 TFLOPs 78.6 TFLOPS 628.8 TFLOPs FP16 Performance 2.3 PFLOPS 2.61 PFLOPS 4.6 PFLOPS 36.8 PFLOPS 5 PFLOPS 40.2 PFLOPS FP8 Performance 4.6 PFLOPS 5.22 PFLOPS 9.2 PFLOPs 73.82 PFLOPs 10.1 PFLOPs 80.5 PFLOPs FP6 Performance Row 7 - Cell 1 Row 7 - Cell 2 18.45 PFLOPS 147.6 PFLOPS 20.1 PFLOPS 161 PFLOPS FP4 Performance* Row 8 - Cell 1 Row 8 - Cell 2 18.45 PFLOPS 147.6 PFLOPS 20.1 PFLOPS 161 PFLOPS

The card's specs are exactly half of what AMD's high-end MI350X and MI355X AI GPUs offer. The MI350P runs off of AMD's CDNA4 architecture and is built on TSMC's 3nm and 6nm FinFET process. The GPU comes with 8,192 cores, 128 CUs, 512 Matrix Cores, and has a 2.2GHz max clock speed. The GPU is paired to 144GB of HBM3E memory with 4TB/s of bandwidth, and a 128MB last-level cache.

Latest Videos From

Just like the MI350X and MI355X, the MI350P offers native support for lower-precision MXFP6 and MXFP4 to accelerate LLMs. Up to eight MI350P cards can be paired together in a single system, allowing data centers to scale performance based on how many cards are used. The MI350P is geared towards small, medium, and large AI workloads surrounding inference and RAG pipelines. AMD claims the GPU is the fastest enterprise PCIe card with an estimated 2,299 TFLOPs and 4,600 peak TFLOPs of performance using MXFP4.

The introduction of the MI350P finally gives AMD a proper competitor to Nvidia's fastest PCIe AI accelerator, currently the H200 NVL. The MI350P is based on a newer architecture and edges out the H200 NVL in performance, featuring 20% better FP64, 43% better FP16, and 39% better FP8 theoretical compute performance.

Image 1 of 17 (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD) (Image credit: AMD)

Nvidia has not announced a PCIe version of its latest B200 Blackwell GPUs running HBM memory, so for now, AMD will have the most bleeding-edge AI accelerator that fits in a PCIe form factor. It remains to be seen how widely adopted AMD's new card will be, given Nvidia's hold on the market with CUDA. But AMD is working to improve its competing ROCm software stack, as the GPU maker explained to us at CES 2026.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.