NVIDIA is warning users to activate System Level Error-Correcting Code mitigation to protect against Rowhammer attacks on graphical processors with GDDR6 memory.
The company is reinforcing the recommendation as new research demonstrates a Rowhammer attack against an NVIDIA A6000 GPU (graphical processing unit).
Rowhammer is a hardware fault that can be triggered through software processes and stems from memory cells being too close to each other. The attack was demonstrated on DRAM cells but it can affect GPU memory, too.
It works by accessing a memory row with enough read-write operations, which causes the value of adjacent data bits to flip from one to zero and vice-versa, causing the in-memory information to change.
The effect could be a denial-of-service condition, data corruption, or even privilege escalation.
System Level Error-Correcting Codes (ECC) can preserve the integirty of the data by adding redundant bits and correcting single-bit errors to maintain data reliability and accuracy.
In workstation and data center GPUs where VRAM handles large datasets and precise calculations related to AI workloads, ECC must be enabled to prevent crucial errors in their operation.
NVIDIA's security notice notes that researchers at the University of Toronto showed "a potential Rowhammer attack against an NVIDIA A6000 GPU with GDDR6 Memory" where System-Level ECC was not enabled.
The academic researchers developed GPUHammer, an attack method to flip bits on GPU memories.
Although hammering is harder on GDDR6 because of higher latency and faster refresh compared with CPU-based DDR4, the researchers were able to demonstrate that Rowhammer attacks on GPU memory banks is possible.
... continue reading