Skip to content
Tech News
← Back to articles

I Put a Datacenter GPU in My Gaming PC for £200

read original get NVIDIA GeForce RTX 3080 → more articles
Why This Matters

This article highlights how repurposing datacenter GPUs like the Tesla V100 can significantly boost AI and machine learning capabilities in consumer gaming PCs at a fraction of the cost. It demonstrates the potential for enthusiasts to access high-performance hardware traditionally reserved for servers, opening new avenues for affordable AI experimentation and development. This approach could democratize access to advanced computing power, impacting both the tech industry and individual users interested in AI research.

Key Takeaways

I already had an RTX 4080. 16GB of VRAM. Good enough for gaming, not good enough for the models I wanted to run locally. The next step up in GPU land is either spend a fortune on a card with more VRAM, or find another way.

I found another way.

I bought a datacenter GPU that doesn’t even have a normal PCIe connector, stuck it in my gaming PC with an adapter, and now I have 32GB of VRAM across two GPUs running a 27 billion parameter model at 32 tokens per second. The whole thing cost me £200.

The GPU#

This is a Tesla V100 SXM2 16GB. It was designed for NVIDIA’s DGX servers and hyperscaler racks. The SXM2 form factor means it does not have a PCIe slot. It does not have display outputs. It does not have a normal power connector. It sits on a proprietary board inside a server rack and communicates over NVLink.

You cannot plug this into a motherboard. Not without help.

But here is the thing: this is a Volta GPU with 16GB of HBM2 memory, 5120 CUDA cores, and I picked it up for about £150 on eBay. The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.

HBM2 is a different class of memory. The V100 has a 4096-bit memory bus delivering 900 GB/s of bandwidth. To put that in perspective, my RTX 4080 with its fancy GDDR6X manages 736 GB/s. The V100 from 2017 has 22% more memory bandwidth than a GPU that launched in 2022.

And it is not just NVIDIA’s consumer cards that lose. Apple’s M3 Max does 400 GB/s. The M4 Max does 546 GB/s. The brand new M5 Max, which will set you back over £3,000 for a laptop, manages 614 GB/s. A GPU from 2017 beats every Mac on the market.

The closest AMD competition to my 4080 is the RX 7900 XTX, which does 960 GB/s on its 24GB of GDDR6. Technically that edges out the V100, but the 7900 XTX costs £700+ and ROCm support for LLM inference is still rough compared to CUDA. The V100 gives you 94% of that bandwidth for less than a quarter of the price, and it just works with llama.cpp.

... continue reading