A new AI translation system for headphones clones multiple voices simultaneously
Published on: 2025-07-22 13:00:00
Spatial Speech Translation consists of two AI models, the first of which divides the space surrounding the person wearing the headphones into small regions and uses a neural network to search for potential speakers and pinpoint their direction.
The second model then translates the speakers’ words from French, German, or Spanish into English text using publicly available data sets. The same model extracts the unique characteristics and emotional tone of each speaker’s voice, such as the pitch and the amplitude, and applies those properties to the text, essentially creating a “cloned” voice. This means that when the translated version of a speaker’s words is relayed to the headphone wearer a few seconds later, it sounds as if it’s coming from the speaker’s direction and the voice sounds a lot like the speaker’s own, not a robotic-sounding computer.
Given that separating out human voices is hard enough for AI systems, being able to incorporate that ability into a real-time translation s
... Read full article.