‘Mind-captioning’ AI decodes brain activity to turn thoughts into text

Functional magnetic resonance imaging is a non-invasive way to explore brain activity.Credit: National Institute of Mental Health/National Institutes of Health/SPL

Reading a person’s mind using a recording of their brain activity sounds futuristic, but it’s now one step closer to reality. A new technique called ‘mind captioning’ generates descriptive sentences of what a person is seeing or picturing in their mind using a read-out of their brain activity, with impressive accuracy.

The technique, described in a paper published today in Science Advances1, also offers clues for how the brain represents the world before thoughts are put into words. And it might be able to help people with language difficulties, such as those caused by strokes, to better communicate.

The model predicts what a person is looking at “with a lot of detail”, says Alex Huth, a computational neuroscientist at the University of California, Berkeley. “This is hard to do. It’s surprising you can get that much detail.”

Scan and predict

Researchers have been able to accurately predict what a person is seeing or hearing using their brain activity for more than a decade. But decoding the brain's interpretation of complex content, such as short videos or abstract shapes, has proved to be more difficult.

Previous attempts have identified only key words that describe what a person saw rather than the complete context, which might include the subject of a video and actions that occur in it, says Tomoyasu Horikawa, a computational neuroscientist at NTT Communication Science Laboratories in Kanagawa, Japan. Other attempts have used artificial intelligence (AI) models that can create sentence structure themselves, making it difficult to know whether the description was actually represented in the brain, he adds.

Horikawa’s method first used a deep-language AI model to analyse the text captions of more than 2,000 videos, turning each one into a unique numerical ‘meaning signature’. A separate AI tool was then trained on six participants’ brain scans and learnt to find the brain-activity patterns that matched each meaning signature while the participants watched the videos.

The rise of brain-reading technology: what you need to know

Once trained, this brain decoder could read a new brain scan from a person watching a video and predict the meaning signature. Then, a different AI text generator would search for a sentence that comes closest to the meaning signature decoded from the individual’s brain.