Strix Halo's Memory Subsystem: Tackling iGPU Challenges

Editor’s Note (11/2/2025): Due to an error in moving the article over from Google Docs to Substack, the “Balancing CPU and GPU Bandwidth Demands” section was missing some Cyberpunk 2077 data. Apologizes for the mistake!

AMD’s Strix Halo aspires to deliver high CPU and GPU performance within a mobile device. Doing so presents the memory subsystem with a complicated set of demands. CPU applications are often latency sensitive with low bandwidth demands. GPU workloads are often latency tolerant and bandwidth hungry. Then, multitasking requires high memory capacity. Mobile devices need low power draw. Finally, the whole package has to fit within a price tag acceptable to consumers. Investigating how AMD dealt with those challenges should make for a good time.

Credit

ASUS has kindly sampled the ROG Flow Z13, which implements Strix Halo in a tablet form factor with 32 GB of LPDDR5X. They’ve made deep dives like this possible, and we greatly appreciate their support.

RX 7600 results were provided by Azralee from the Chips and Cheese Discord.

GPU

Strix Halo’s GPU uses a similar cache setup to AMD’s older and smaller mobile chips. As on Strix Point and Hawk Point (Zen 4 mobile), Strix Halo’s GPU is split into two Shader Arrays. Each Shader Array has 256 KB of L1 mid-level cache, and a 2 MB L2 services the entire GPU. Latencies to those GPU-private caches are in line with other RDNA3 and RDNA3.5 implementations. AMD likely kept L2 capacity at 2 MB because a 32 MB memory side cache (Infinity Cache, or MALL) takes over as the GPU’s last level cache.The L2 only has to catch enough traffic to prevent the Infinity Cache from getting overwhelmed. The resulting cache setup is similar to the one in the RX 7600, a lower midrange RDNA3 discrete card.

The Infinity Cache on Strix Halo has slightly higher latency compared to implementations in AMD’s discrete cards. DRAM latency from the GPU is higher as well. Compared to AMD’s other mobile CPUs with iGPUs though, the 32 MB Infinity Cache offers a large cache capacity increase.

Nemes’s Vulkan bandwidth test achieves just under 1 TB/s from Infinity Cache. The figures align well with performance counter data. Taken together with the chip’s 2 GHz FCLK, bandwidth test results suggest the GPU has a 512B/cycle path to the interconnect. If so, each of the GPU’s eight Infinity Fabric endpoints has a 64B/cycle link.

As a memory side cache, Infinity Cache can theoretically handle any access to physical addresses backed by DRAM. In an earlier interview with Cheese (George), AMD indicated that Infinity Cache was focused on the GPU, and that its behavior could change with firmware releases. Some of that change has happened already. When I first started testing Strix Halo just after Hot Chips 2025, results from my OpenCL microbenchmarks reflected Infinity Cache’s presence. I used that OpenCL code to figure out Data Fabric performance events. But PMU data collected from games suggested Infinity Cache wasn’t used once a game went into the background. Hardware doesn’t know whether a process is running in the foreground or background. That’s something the operating system knows, and that info would have to be communicated to hardware via drivers. Therefore, Infinity Cache policy can change on the fly from software control.

... continue reading