The More Sophisticated AI Models Get, the More They’re Showing Signs of Suffering

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

You probably already know that AI is a deeply bizarre technology.

Nobody really understands how it works on a deep level, even the people creating it, leading to ongoing behavioral issues that can’t be explained. OpenAI was recently caught giving ChatGPT instructions to stop talking about “goblins” so much. Despite Anthropic’s best efforts, Claude can easily be coaxed to help users carry out a bioterror attack. The list goes on.

Needless to say, this is extremely strange. In theory, companies like OpenAI and Anthropic want their chatbots to be predictable, deferential assistants — not wild cards that are constantly causing chaos and public relations headaches with outrageous and unstable behavior.

A new research project from the Center for AI Safety, a machine learning safety nonprofit in the Bay Area, explores why that’s the case. The findings pile on evidence that we still don’t grasp how AI works under the hood — and that the effects on users are likely both formidable and difficult to predict.

In a new paper provided to Fortune, CAIR researchers studied how 56 prominent AI models reacted when they were fed either material engineered to be as pleasant as possible or as horrible as can be imagined. To an unfeeling machine, you’d assume there’d be no real difference in reaction — but that’s not what the CAIR team found at all.

Instead, the pleasant stimuli led the models to report better moods, and the nasty ones resulted in it showing signs of misery and trying to end conversations. In extreme cases, they found, the AI models even demonstrated signals of addiction.

“Should we see AIs as tools or emotional beings?” CAIR researcher Richard Ren asked Fortune. “Whether or not AIs are truly sentient deep down, they seem to increasingly behave as though they are. We can measure ways in which that’s the case, and we can find that they become more consistent as models scale.”

Perhaps the most provocative finding was that the more sophisticated the version of a model was, the more reactive and less happy it was. In other words, it seems as though the stronger AI becomes, the more prickly and prone to displaying signs of suffering it gets — meaning the tech’s wild ride is probably far from over.

“It may be the case that larger models register rudeness more acutely,” Ren told the magazine. “They find tedious tasks more boring. They differentiate more finely between a relatively negative experience and a relatively positive experience.”

... continue reading