Claude Opus 4.6 marks a step forward in AI for finance. It can be used to help professionals make decisions based on accurate information and clear analysis, and it produces deliverables with real polish. The model is substantially better than others in the market at financial reasoning, multitasking, and maintaining focus over longer multi-step tasks.
Alongside Claude Opus 4.6, we’re updating some of our existing products—and introducing a new one—to put these capabilities where analysts spend the majority of their time. Cowork now delivers more polished outputs, such as financial models and presentations, on the first pass. Claude in Excel is now better at handling long-running tasks, with Claude Opus 4.6 staying focused and accurate as financial models become more complex. And we’re releasing Claude in PowerPoint as a research preview in beta for natively building and iterating on decks and presentations.
Our internal Real-World Finance evaluation measures Claude’s performance on ~50 investment and financial analysis use cases spanning spreadsheets, slide decks, and word document generation and review. These are tasks commonly performed by analysts across investment banking, private equity, public investing, and corporate finance. Claude Opus 4.6 improves by over 23 percentage points on Claude Sonnet 4.5, our state-of-the-art model just a few months ago.
This eval tests a combination of code execution and tool use agentic harnesses, and was scored based on a combination of rubrics and preferences that gauge finance domain knowledge, task completeness and accuracy, and presentation quality.
Together, these updates make Claude a much stronger partner for those across financial services and corporate finance.
Research, analyze, create
Financial professionals use AI to research effectively across multiple data sources, support financial analyses, and create deliverables that their teams and customers can act on. Claude Opus 4.6 is best in class across all three dimensions.
On research, Claude Opus 4.6 improves on both BrowseComp and DeepSearchQA, two benchmarks that test a model’s ability to extract specific information from large, unstructured data sources. In practice, this means that users can hand Claude a dense set of documents and receive a specific, focused answer, rather than a simple summary.
On analysis, Claude Opus 4.6 is state-of-the-art at 60.7% (achieving a 5.47% improvement from Opus 4.5) on Finance Agent, an external benchmark from Vals AI that evaluates models on research of SEC filings of public companies. Opus 4.6 is also state-of-the-art on the TaxEval by Vals AI at 76.0%.
... continue reading