GoKawiil - A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

A third-party research institute that Anthropic partnered with to test one of its new flagship AI models, Claude Opus 4, recommended against deploying an early version of the model due to its tendency to “scheme” and deceive. According to a safety report Anthropic published Thursday, the institute, Apollo Research, conducted tests to see in which contexts Opus 4 might try to behave in certain undesirable ways. Apollo found that Opus 4 appeared to be much more proactive in its “subversion attempts” than past models, and that it “sometimes double[d] down on its deception” when asked follow-up questions. “[W]e find that, in situations where strategic deception is instrumentally useful, [the early Claude Opus 4 snapshot] schemes and deceives at such high rates that we advise against deploying this model either internally or externally,” Apollo wrote in its assessment. As AI models become more capable, some studies show they’re becoming more likely to take unexpected — and possibly unsaf ... Read full article.

Find Related products on Amazon

A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

Related Articles