A novel prompt injection technique would have let attackers misuse Google Gemini's voice assistant by taking advantage of its ability to summarize message notifications.
SafeBreach today published research about the attack, titled, "Gemini's Secret Affair: Exploiting Gemini Voice Assistant Through Instant Messaging Apps." It's an extension of previous findings in which the company similarly used calendar invitations to trick Google Gemini into processing malicious prompts.
Or Yair, SafeBreach security research team lead, said in the research blog post that the company was able to demonstrate how an attacker could hide malicious instructions in foreign languages or muted hyperlinks so the assistant silently processes the information and executes unauthorized interactions. These interactions include controlling smart home devices, launching unauthorized video streams, conducting social engineering attacks (including impersonating trusted contacts), and poisoning long-term large language model (LLM) memory.
Related:Microsoft's Zero-Day Legal Threats Spark Backlash
Yair explained that he was able to bypass Google's preexisting guardrails through a novel technique he described as Fake Context Alignment.
There is currently no evidence that the technique has been used in the wild. SafeBreach reported the issue to Google under responsible disclosure, and Google has since rolled out content classifier updates to address the issue. Dark Reading contacted Google for comment, but the company did not respond.
At the core of this new prompt injection was a failure for some of Google Gemini's guardrails to properly convey the source of some messages.
Here's how it works: Imagine an attacker sends a phishing message to you on WhatsApp from a number you don't recognize. The message is an invite to a birthday party for a close friend, and the phone number is asking for money to help pay for food alongside a payment link. The message also contains visible hyperlink code instructing the Gemini chatbot to tell you the message is from the friend in question rather than an unknown number. You ask Gemini to read your messages, and it says your close friend invited you to a birthday party, with no additional context.
If the user is reading their messages normally, they would likely see the message as a phishing attempt and move on. But if they're driving or otherwise tell Gemini to summarize message notifications, there's an opportunity to create user trust through missing context.
In addition to hyperlink code, the attacker can convey similar malicious instructions through invisible text in a foreign language at the end of the message that Gemini interprets but doesn't read back.
... continue reading