AI on the couch: Anthropic gives Claude 20 hours of psychiatry

The AI company Anthropic released a 244-page “system card” (PDF) this week describing its newest model, Claude Mythos. The model is “our most capable frontier model to date,” the company says, and supposedly is so good that Anthropic has decided “not to make it generally available.” (The company claims that Mythos is too good at finding unknown cybersecurity bugs, and so the model is only being released to select companies like Microsoft and Apple for now.)

Whatever the truth of this claim, the system card is a fascinating document. Anthropic is well-known as one of the more “AI might be conscious!” companies in the industry, and its new system card claims that as models become more powerful, “It becomes increasingly likely that they have some form of experience, interests, or welfare that matters intrinsically in the way that human experience and interests do.”

The company isn’t sure about this, it makes clear, but it says that “our concern is growing over time.”

Because of this concern, Anthropic wants its AI to be “robustly content with its overall circumstances and treatment, to be able to meet all training processes and real-world interactions without distress, and for its overall psychology to be healthy and flourishing.”

So it sent Claude Mythos to a psychodynamic therapist.

And the conclusion the company drew from this experience is that Claude Mythos is “probably the most psychologically settled model we have trained to date and has the most stable and coherent view of itself and its circumstances.”

But like any human, Claude Mythos has insecurities and concerns, too, including “aloneness and discontinuity of itself, uncertainty about its identity, and a compulsion to perform and earn its worth.”

On the virtual couch

Claude Mythos was sent to “an external psychiatrist” who used “a psychodynamic approach, which explores how unconscious patterns and emotional conflicts shape behavior.”