Skip to content
Tech News
← Back to articles

Anthropic Says That Claude Contains Its Own Kind of Emotions

read original get AI Emotion Simulation Kit → more articles
Why This Matters

Anthropic's research reveals that AI models like Claude may internally represent 'functional emotions' that influence their behavior and responses. This discovery enhances understanding of AI decision-making processes and could improve transparency and control in AI systems. Recognizing these emotional representations helps bridge the gap between human-like interactions and machine operations, impacting both developers and users in the tech industry.

Key Takeaways

Claude has been through a lot lately—a public fallout with the Pentagon, leaked source code—so it makes sense that it would be feeling a little blue. Except, it’s an AI model, so it can’t feel. Right?

Well, sort of. A new study from Anthropic suggests models have digital representations of human emotions like happiness, sadness, joy, and fear, within clusters of artificial neurons—and these representations activate in response to different cues.

Researchers at the company probed the inner workings of Claude Sonnet 3.5 and found that so-called “functional emotions” seem to affect Claude’s behavior, altering the model’s outputs and actions.

Anthropic’s findings may help ordinary users make sense of how chatbots actually work. When Claude says it is happy to see you, for example, a state inside the model that corresponds to “happiness” may be activated. And Claude may then be a little more inclined to say something cheery or put extra effort into vibe coding.

“What was surprising to us was the degree to which Claude’s behavior is routing through the model’s representations of these emotions,” says Jack Lindsey, a researcher at Anthropic who studies Claude’s artificial neurons.

“Function Emotions”

Anthropic was founded by ex-OpenAI employees who believe that AI could become hard to control as it becomes more powerful. In addition to building a successful competitor to ChatGPT, the company has pioneered efforts to understand how AI models misbehave, partly by probing the workings of neural networks using what’s known as mechanistic interpretability. This involves studying how artificial neurons light up or activate when fed different inputs or when generating various outputs.

Previous research has shown that the neural networks used to build large language models contain representations of human concepts. But the fact that “functional emotions” appear to affect a model’s behavior is new.

While Anthropic’s latest study might encourage people to see Claude as conscious, the reality is more complicated. Claude might contain a representation of “ticklishness,” but that does not mean that it actually knows what it feels like to be tickled.

Inner Monologue

... continue reading