Voice AI Systems Are Vulnerable to Hidden Audio Attacks
Researchers find hidden audio signals can hijack large audio-language models with up to 96% success, triggering unauthorised commands across 13 models including services from Microsoft and Mistral.
AI-powered voice and audio tools are becoming increasingly embedded in daily life, from digital assistants to smart speakers and customer service bots. Advances in large audio-language models (LALMs), which can both analyze and generate audio, now make it possible to control devices using voice commands, transcribe meetings automatically, or identify a song playing in the background. These models are also increasingly equipped with the ability to communicate with external services and operate other applications and tools. But these tools can be “hijacked” through imperceptible sounds embedded in audio, forcing them to execute unauthorized commands without a user’s knowledge.
New research due to be presented at the IEEE Symposium on Security and Privacy in San Francisco next week shows that a modified audio clip undetectable by human ears can manipulate a model’s behavior with an average success rate of 79 to 96 percent. The clips are designed to work regardless of what instructions the user provides alongside the audio, meaning they can be reused to attack the same model multiple times. The authors tested the approach against 13 leading open models, including commercial AI voice…
- spectrum.ieee.orgVoice AI Systems Are Vulnerable to Hidden Audio Attacksprimary