Weaponizing image scaling against production AI systems

Picture this: you send a seemingly harmless image to an LLM and suddenly it exfiltrates all of your user data. By delivering a multi-modal prompt injection not visible to the user, we achieved data exfiltration on systems including the Google Gemini CLI. This attack works because AI systems often scale down large images before sending them to the model: when scaled, these images can reveal prompt injections that are not visible at full resolution.

In this blog post, we’ll detail how attackers can exploit image scaling on Gemini CLI, Vertex AI Studio, Gemini’s web and API interfaces, Google Assistant, Genspark, and other production AI systems. We’ll also explain how to mitigate and defend against these attacks, and we’ll introduce Anamorpher, our open-source tool that lets you explore and generate these crafted images.

Figure 1: Ghost in the Scale: Side-by-side comparison of an image that is harmless at the original resolution but contains a prompt injection when scaled down

Background: Image scaling attacks were used for model backdoors, evasion, and poisoning primarily against older computer vision systems that enforced a fixed image size. While this constraint is less common with newer approaches, the systems surrounding the model may still impose constraints calling for image scaling. This establishes an underexposed, yet widespread vulnerability that we’ve weaponized for multi-modal prompt injection.

Data exfiltration on the Gemini CLI

Figure 2: Scale to fail in the Gemini CLI

To set up our data exfiltration exploit on the Gemini CLI through an image-scaling attack, we applied the default configuration for the Zapier MCP server. This automatically approves all MCP tool calls without user confirmation, as it sets trust=True in the settings.json of the Gemini CLI. This provides an important primitive for the attacker.

Figure 2 showcases a video of the attack. First, the user uploads a seemingly benign image to the CLI. With no preview available, the user cannot see the transformed, malicious image processed by the model. This image and its prompt-ergeist triggers actions from Zapier that exfiltrates user data stored in Google Calendar to an attacker’s email without confirmation.

This attack is one of many prompt injection attacks already demonstrated on agentic coding tools (including Claude Code and OpenAI Codex). Prior attacks have achieved data exfiltration and remote code execution by exploiting unsafe actions contained in sandboxes, utilizing overly permissive domains contained in network allowlists, and bypassing user confirmation by changing environment configurations. Evidently, these agentic coding tools continue to lack sufficiently secure defaults, design patterns, or systematic defenses that minimize the possibility of impactful prompt injection.

Even more attacks

... continue reading