← Back to articles

$200 'socketed' Nvidia AI GPU for servers hacked into a PCIe card with custom PCB and 3D-printed cooling — modded Tesla V100 SMX data center GPU runs AI LLMs and is more efficient than many modern midrange offerings in AI inference

2026-05-10 | original

read original get Nvidia Tesla V100 PCIe → more articles

Why This Matters

This innovative modification demonstrates how older, cost-effective GPUs like the Nvidia V100 can be repurposed for AI inference tasks, offering a budget-friendly alternative for running large language models locally. By converting a server-grade GPU into a PCIe card with custom cooling, enthusiasts can achieve impressive performance at a fraction of the cost of modern hardware, potentially democratizing access to AI capabilities. This approach highlights the ongoing value of legacy hardware and creative engineering in the rapidly evolving AI industry.

Key Takeaways

Older GPUs like the Nvidia V100 can be effectively repurposed for AI inference, offering a cost-efficient solution.
Custom modifications, including PCIe conversion and 3D-printed cooling, enable legacy hardware to perform competitively.
This DIY approach lowers barriers to running large language models locally, benefiting both consumers and the AI industry.

Running LLMs locally on your GPU requires a lot of VRAM, which can drive the rig's cost up exponentially these days. Amidst the ongoing AI boom, the best value lies in older, often forgotten silicon that's still capable, which is exactly what YouTuber Hardware Haven found. He took an Nvidia V100 server GPU with an SMX interface, which is similar to using a socketed processor, and converted it to a standard PCIe bus, which plugged into a consumer motherboard. It ended up performing quite well for its stature (and cost), even against modern SKUs.

The contraption begins with an Nvidia Tesla V100 AI GPU that uses the SMX2 socket and is designed for rack-scale deployments. The SMX interface is a mezzanine-based connector that mounts GPUs flat against a specialized baseboard, similar to a CPU socket, and the GPU is then screwed down to the baseboard. The host was able to acquire this GPU for just $100, and the accompanying SMX-to-PCIe x16 adapter was also around $100, bringing the total cost of the setup to $200. The V100 comes with either 16 or 32GB of HBM2 (we're working with 16GB here, sporting 900 GB/s of bandwidth), and it's based on the Turing architecture.

The PCIe adapter card didn't come with any cooling of its own, and since the V100 is literally just a heatsink on a PCB, the YouTuber designed and 3D-printed a duct for it. He attached an 80mm Notcua fan on the end to draw in fresh air toward the heatsink. The adapter also has 2x 8-pin PCIe power connectors for, well, power, along with 3x 4-pin PWM headers. It does not feature a secondary SMX socket for NVLink; however, such sockets are much more expensive.

Latest Videos From

Once the GPU was ready and slotted into a standard Ryzen system, it was time to test just how artificially intelligent a 2017 card is. Keep in mind that the V100 has no display output, so you need integrated graphics in your CPU to actually use your computer. In Ollama, using gpt-oss-20b, the V100 was able to crank out 130 tokens per second, while the Radeon RX 7800 XT in the YouTuber's daily driver system only achieved about 90 tokens per second.

This Ridiculous $200 AI GPU Shouldn’t Be This Good - YouTube Watch On

Both cards have 16 GB of VRAM, and the RX 7800 XT is even newer with supposedly more efficient silicon, but then again, Nvidia is the gold standard for software support in these benchmarks. So, the host switched to an RTX 3060 12 GB (the best Nvidia GPU he had on hand) to compare against the V100, which is also built on newer Ampere architecture.

Running Google's gemma4: e4b, the V100 topped out at 108 tokens per second, while the 3060 12 GB only managed about 76 tokens per second, but it did so consuming less power — 293W on the V100 versus 235W on the RTX 3060. If we calculate tokens per watt, that comes out to around 0.37 for the V100, slightly more efficient than the 0.33 tokens per second per watt on the 3060.

Power-limiting the V100 to 100W (it comes with 300W out of the box) dropped the power draw to 170W in the same test, while still producing 95 tok/s. To make the comparison fair, the YouTuber also limited the 3060 to 100W; it ended up consuming 171W and producing just 68 tokens per second. So, with both new results, the V100 achieves an efficiency score of 0.55 tokens/s per watt, while the 3060 12 GB was stuck at 0.39 tokens/s per watt.

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors

... continue reading

Explore topics: nvidia v100 socketed gpu pcie adapter turing architecture ai inference