Skip to content
Tech News
← Back to articles

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

read original get NVIDIA GeForce RTX 3090 → more articles
Why This Matters

This article highlights the potential of combining high-end GPUs like the RTX 5080 and RTX 3090 for enhanced AI and gaming performance, emphasizing the importance of proper hardware setup and BIOS configuration. Such multi-GPU setups can significantly boost processing speeds for large language models, offering consumers and the industry more powerful and flexible AI experimentation tools.

Key Takeaways

A year ago, I bought an RTX 5080 for both gaming and AI experiments. Little did I know back then that I would be giving into the joys of local LLM setups.

Fast forward 2026, Qwen 3.5, Gemma, Qwen 3.6, I needed more than 16GB. So I got myself a refurbished RTX 3090 with 24GB. I could then run Qwen 3.6 Q4 quants, first at ~30 tok/s, then 50-60 with MTP. Not bad. But still felt limited while my 5080 was barely used.

So I began digging what kind of setup could take profit of those 2 cards together. I already had DDR4 sticks and SSD disks ready, I only needed a mobo capable of handling the two cards.

Enters the Asus Prime X570-Pro, the “Pro” is important, it is what ensures the 16x PCIe can be splitted in 2x8.

The 5080 being the monster it is I bought a good quality PCIe 4 riser to plug it on the second slot.

BIOS

The BIOS part was more complex than I anticipated. First and foremost: you CAN’T boot the OS in BIOS/MBR mode, this will forbid the use of both cards and implies kernel parameters unnecessary trickery even for one of them.

The parameters that should be set:

Go to the Boot tab and set CSM (Compatibility Support Module) to Disabled

Go to the Advanced tab -> PCI Subsystem Settings

... continue reading