Don’t believe reasoning models’ Chains of Thought, says Anthropic
Published on: 2025-05-14 08:53:03
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
We now live in the era of reasoning AI models where the large language model (LLM) gives users a rundown of its thought processes while answering queries. This gives an illusion of transparency because you, as the user, can follow how the model makes its decisions.
However, Anthropic, creator of a reasoning model in Claude 3.7 Sonnet, dared to ask, what if we can’t trust Chain-of-Thought (CoT) models?
“We can’t be certain of either the ‘legibility’ of the Chain-of-Thought (why, after all, should we expect that words in the English language are able to convey every single nuance of why a specific decision was made in a neural network?) or its ‘faithfulness’—the accuracy of its description,” the company said in a blog post. “There’s no specific reason why the reported Chain-of-Thought must accurately reflect the true reasoning process; there might even be circ
... Read full article.