Researchers have developed a novel attack that steals user data by injecting malicious prompts in images processed by AI systems before delivering them to a large language model.
The method relies on full-resolution images that carry instructions invisible to the human eye but become apparent when the image quality is lowered through resampling algorithms.
Developed by Trail of Bits researchers Kikimora Morozova and Suha Sabi Hussain, the attack builds upon a theory presented in a 2020 USENIX paper by a German university (TU Braunschweig) exploring the possibility of an image-scaling attack in machine learning.
How the attack works
When users upload images onto AI systems, these are automatically downscaled to a lower quality for performance and cost efficiency.
Depending on the system, the image resampling algorithms could make an image lighter using nearest neighbor, bilinear, or bicubic interpolation.
All of these methods introduce aliasing artifacts that allow for hidden patterns to emerge on the downscaled image if the source is specifically crafted for this purpose.
In the Trail of Bits example, specific dark areas of a malicious image turn red, allowing hidden text to emerge in black when bicubic downscaling is used to process the image.
Example of a hidden message appearing on the downscaled image
Source: Zscaler
... continue reading