Anthropic's Claude 3.7 Sonnet is here and results are insane
Published on: 2025-07-15 20:07:57
Anthropic has started rolling out Claude 3.7 Sonnet, the company's most advanced model and the first hybrid reasoning model it has shipped.
Early tests show that Claude 3.7 Sonnet is outperforming rivals, including OpenAI's ChatGPT models and China's DeepSeek.
In a blog post, Anthropic noted that its newest model combines fast, straightforward answers with the ability to “think” step-by-step for complex tasks. This makes the Claude 3.7 model the best for programming, and these claims are backed by benchmarks.
SWE-bench Verified shows Claude 3.7 Sonnet is the best model for coding
According to a benchmark test called “Software engineering (SWE-bench verified),” Claude 3.7 Sonnet is at the top with roughly 62% accuracy, which goes up to 70% when using extra test-time “scaffolding.”
Competing models, including Claude 3.5 Sonnet and OpenAI’s variants, sit closer to the 50% range.
"Software engineering (SWE-bench verified)" is a benchmarking standard to see how well an AI model does w
... Read full article.