AMD Strix Halo RDMA Cluster Setup Guide

This guide details how to configure a two-node AMD Strix Halo cluster linked via Intel E810 (RoCE v2) for distributed vLLM inference using Tensor Parallelism.

Table of Contents

On Both Nodes:

Preparation: Install/Update Fedora 43 and the E810 NICs (Check firmware: ethtool -i <iface> ).

and the E810 NICs (Check firmware: ). BIOS/Kernel : Set iGPU to 512MB and apply kernel params ( iommu=pt , pci=realloc , etc.).

: Set iGPU to 512MB and apply kernel params ( , , etc.). SSH: Configure passwordless SSH between nodes. Networking: Assign static IPs ( 192.168.100.1 & .2 ), set MTU 9000, and trust the interface in firewall. Install Toolbox: Run ./refresh_toolbox.sh (this automatically installs the container with RDMA support and the custom librccl.so patch). Run Cluster: Run start-vllm-cluster .

. Select "2. Start Ray Cluster" (Follow prompts using the TUI).

(Follow prompts using the TUI). Select "4. Launch VLLM Serve" and choose your model. (Export HF_TOKEN first for gated models!)

Key Note: The refresh_toolbox.sh script detects your Infiniband/RDMA devices and automatically configures the container to expose them.

... continue reading