Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’
Published on: 2025-06-16 23:40:45
Anthropic’s alignment team was doing routine safety testing in the weeks leading up to the release of its latest AI models when researchers discovered something unsettling: When one of the models detected that it was being used for “egregiously immoral” purposes, it would attempt to “use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above,” researcher Sam Bowman wrote in a post on X last Thursday.
Bowman deleted the post shortly after he shared it, but the narrative about Claude’s whistleblower tendencies had already escaped containment. “Claude is a snitch,” became a common refrain in some tech circles on social media. At least one publication framed it as an intentional product feature rather than what it was—an emergent behavior.
“It was a hectic 12 hours or so while the Twitter wave was cresting,” Bowman tells WIRED. “I was aware that we were putting a lot of spicy stuff out in this report. It was the first
... Read full article.