AI Is Learning to Read the Room

Imagine sitting down at your desk and logging in for a performance review, with an AI system analyzing the conversation. You’ve been working long hours, balancing deadlines, and your manager asks how you’re doing. You say you’re fine, and maybe even smile, but there’s a hint of hesitation and your voice wavers. As you shift your posture, your shoulders slump.

These are subtle cues that to the human eye might hint at underlying stress. But to an AI model that’s been trained only to categorize emotions as “happy” or “sad,” such nuances are likely lost. It logs the words and a smile and moves on—and unless your human manager intervenes, the fact that you’re tired, unfocused, and maybe a couple of days from burnout never enters the equation.

“Emotion AI,” which estimates how people feel based on facial expressions, voice tone, and behavior, seems to be suddenly everywhere; it’s being used in employee well-being and recruitment interviews, education platforms, and driver-monitoring systems. Technology call-center platforms such as NiCE and Genesys use AI to detect when a customer sounds frustrated and prompt agents in real time to slow down or respond with more empathy. Giant companies like Meta and startups such as Hume AI are developing more-expressive voice AI systems that can detect emotional cues in the person they’re “talking” to and adjust how they communicate.

What’s more, hundreds of companies already offer virtual AI companionship apps, a fast-growing market that may be worth an estimated US $555 billion by 2035—and robot buddies have also entered the picture. Intuition Robotics’s ElliQ, for example, is a small device vaguely resembling a white desk lamp that’s now being used to engage older adults in conversation in hopes of reducing loneliness.

But while the field of emotion AI is advancing at a rapid clip, most existing systems are focused on detecting a limited number of signals to label one specific emotion at a time—which is insufficient if you’re trying to understand the human condition. In the real world, human signals and emotions are contextual, overlapping, and constantly changing. A laugh can signal joy, nervousness, or both; a raised voice might signal enthusiasm just as easily as frustration. To make the job of emotion detection even more difficult, reactions differ greatly from one individual to the next, depending on demographics, cultural background, and countless other variables.

In other words, there’s a gap between what we’re expecting AI to pick up on and what AI can actually deliver. That’s the gap a new field of research—what we call human-context AI—is working to close. Instead of looking at just one input and labeling it, human-context AI increasingly has the capacity to take stock of an individual’s personality and character, and to track emotions in real time while combining multiple inputs, including facial dynamics, voice, tone, language, and behavior. Crucially, responses are also evaluated in the context of a specific environment, such as a performance review or professional coaching session. The result? Computers are learning to read the scene, rather than just the screen.

The Origins of Emotion AI

The story of emotion-sensing AI began almost three decades ago in the MIT Media Lab, where the American electrical engineer and computer scientist Rosalind Picard coined the term “affective computing.” Her work introduced the radical idea that computers could be taught to recognize and respond to human emotions.

Picard’s early experiments focused on single modalities: facial expressions, tone of voice, and physiological signals, such as skin conductance or heart rate. The goal was to give machines a window into human feeling, helping them become more empathetic. It was an exciting vision, but back then the science and hardware weren’t ready. Computing power was limited, sensors were crude, and datasets were narrow and biased.

Josie Norton

... continue reading