Anthropic today released Opus 4.5, its flagship frontier model, and it brings improvements in coding performance, as well as some user experience improvements that make it more generally competitive with OpenAI’s latest frontier models.
Perhaps the most prominent change for most users is that in the consumer app experiences (web, mobile, and desktop), Claude will be less prone to abruptly hard-stopping conversations because they have run too long. The improvement to memory within a single conversation applies not just to Opus 4.5, but to any current Claude models in the apps.
Users who experienced abrupt endings (despite having room left in their session and weekly usage budgets) were hitting a hard context window (200,000 tokens). Whereas some large language model implementations simply start trimming earlier messages from the context when a conversation runs past the maximum in the window, Claude simply ended the conversation rather than allow the user to experience an increasingly incoherent conversation where the model would start forgetting things based on how old they are.
Now, Claude will instead go through a behind-the-scenes process of summarizing the key points from the earlier parts of the conversation, attempting to discard what it deems extraneous while keeping what’s important.
Developers who call Anthropic’s API can leverage the same principles through context management and context compaction.
Opus 4.5 performance
Opus 4.5 is the first model to surpass an accuracy score of 80 percent—specifically, 80.9 percent in the SWE-Bench Verified benchmark, narrowly beating OpenAI’s recently released GPT-5.1-Codex-Max (77.9 percent) and Google’s Gemini 3 Pro (76.2 percent). The model performs particularly well in agentic coding and agentic tool use benchmarks, but still lags behind GPT-5.1 in visual reasoning (MMMU).