Voice AI Systems Are Vulnerable to Hidden Audio Attacks

AI-powered voice and audio tools are becoming increasingly embedded in daily life, from digital assistants to smart speakers and customer service bots.

Advances in large audio-language models (LALMs), which can both analyze and generate audio, now make it possible to control devices using voice commands, transcribe meetings automatically, or identify a song playing in the background. These models are also increasingly equipped with the ability to communicate with external services and operate other applications and tools.

But these tools can be “hijacked” through imperceptible sounds embedded in audio, forcing them to execute unauthorized commands without a user’s knowledge. New research due to be presented at the IEEE Symposium on Security and Privacy in San Francisco next week shows that a modified audio clip undetectable by human ears can manipulate a model’s behavior with an average success rate of 79 to 96 percent. The clips are designed to work regardless of what instructions the user provides alongside the audio, meaning they can be reused to attack the same model multiple times.

The authors tested the approach against 13 leading open models, including commercial AI voice services from Microsoft and Mistral, and showed they could coax models into conducting sensitive web searches, downloading files from attacker-controlled sources, and sending emails containing user data.

“It takes just half an hour to train this signal and then, because this signal is context-agnostic, you can use it to attack the target model whenever you want, no matter what the user says,” says lead author Meng Chen, a Ph.D. student at Zhejiang University in China.

How adversarial audio injects attacks

The research builds on years of work into “adversarial audio examples”—audio manipulated to deceive machine learning models. Previous work focused primarily on how these files could induce incorrect predictions in models that perform one-way tasks like speech recognition or audio classification.

What singles out this new work, says Chen, is that it targets generative models capable of producing responses and taking actions. Their technique, dubbed AudioHijack, exploits a critical security flaw in LALM design: Because these models can receive instructions in audio format, malicious instructions can be hidden in manipulated clips to elicit a wide range of undesirable behaviors.

Many previous attacks on generative models required the attacker to have complete control over both the final audio input and original instructions given to the model, essentially acting as the user. Here, the attacker manipulates only the audio data being processed by the model, which makes it possible to attack a model while it’s being used by someone else.

Real-world examples include hiding malicious instructions in online videos, music clips, or voice notes that users query an AI about, or broadcasting malicious audio on a Zoom call that is then uploaded to AI transcription services. Chen says the team’s more recent, unpublished studies have also demonstrated the ability to inject their malicious audio into a live voice chat with an AI in real time.

... continue reading