Skip to content
Tech News
← Back to articles

AMD Strix Halo RDMA Cluster Setup Guide

read original more articles
Why This Matters

This guide highlights how to set up an AMD Strix Halo RDMA cluster for high-performance distributed AI inference, leveraging advanced networking and parallelism technologies. It underscores the importance of optimized hardware and software configurations to enhance AI model scalability and speed, which is crucial for both industry applications and consumer AI tools.

Key Takeaways

AMD Strix Halo RDMA Cluster Setup Guide

This guide details how to configure a two-node AMD Strix Halo cluster linked via Intel E810 (RoCE v2) for distributed vLLM inference using Tensor Parallelism.

Table of Contents

On Both Nodes:

Preparation: Install/Update Fedora 43 and the E810 NICs (Check firmware: ethtool -i <iface> ).

and the E810 NICs (Check firmware: ). BIOS/Kernel : Set iGPU to 512MB and apply kernel params ( iommu=pt , pci=realloc , etc.).

: Set iGPU to 512MB and apply kernel params ( , , etc.). SSH: Configure passwordless SSH between nodes. Networking: Assign static IPs ( 192.168.100.1 & .2 ), set MTU 9000, and trust the interface in firewall. Install Toolbox: Run ./refresh_toolbox.sh (this automatically installs the container with RDMA support and the custom librccl.so patch). Run Cluster: Run start-vllm-cluster .

. Select "2. Start Ray Cluster" (Follow prompts using the TUI).

(Follow prompts using the TUI). Select "4. Launch VLLM Serve" and choose your model. (Export HF_TOKEN first for gated models!)

Key Note: The refresh_toolbox.sh script detects your Infiniband/RDMA devices and automatically configures the container to expose them.

... continue reading