Skip to content
Tech News
← Back to articles

4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave

read original more articles
Why This Matters

This article highlights the challenges and solutions involved in cooling high-performance GPUs used for long-duration AI training. Converting RTX PRO 6000 Blackwell cards to water cooling proved essential for maintaining stable operation during extended workloads, emphasizing the importance of effective thermal management in demanding computational tasks.

Key Takeaways

This rig exists to train models, not serve them. Four RTX PRO 6000 Blackwell cards in one chassis at 600 W each is 2.4 kW of heat to evict, and training runs are hours-to-days long with every card pinned at full TDP. Air coolers can do it for an inference burst; they cannot do it for a multi-day training job — the fans get loud, the cards stack their exhaust into each other, and the first one to thermal-throttle stalls the whole synchronous step.

So we converted the cards to waterblocks. We did one card first as a pilot, ran it for about a week, and only after that did we touch the other three. That sequencing matters — it’s why we have a story to tell. The pilot card failed, taught us a lesson, and the lesson is the reason the other three went on without incident.

This post is the short version: what we did, what broke, what we learned, and where we landed.

The rig#

4× RTX PRO 6000 Blackwell Workstation (GB202, 96 GB GDDR7, 600 W)

Threadripper Pro 7995WX on WRX90

4× Bykski waterblocks (full-cover, GPU + VRM + memory front-side)

Custom loop: single distro/reservoir, two pumps, distilled water, two Alphacool NexXxoS XT45 Full Copper 1260 mm Super Nova radiators (9× 140 mm fans each), four GPUs plumbed in parallel

2× 1500 W PSUs (3 kW total budget) to feed the ~2.4 kW sustained draw; AC circuit got upgraded mid-build after an earlier all-cards-down event under load

That’s one radiator. There are two of them.

... continue reading