Hardening Firefox with Anthropic's Red Team

For more than two decades, Firefox has been one of the most scrutinized and security-hardened codebases on the web. Open source means our code is visible, reviewable, and continuously stress-tested by a global community.

A few weeks ago, Anthropic’s Frontier Red Team approached us with results from a new AI-assisted vulnerability-detection method that surfaced more than a dozen verifiable security bugs, with reproducible tests. Our engineers validated the findings and landed fixes ahead of the recently shipped Firefox 148.

For users, that means better security and stability in Firefox. Adding new techniques to our security toolkit helps us identify and fix vulnerabilities before they can be exploited in the wild.

An emerging technique, pressure-tested by Firefox engineers

AI-assisted bug reports have a mixed track record, and skepticism is earned. Too many submissions have meant false positives and an extra burden for open source projects. What we received from the Frontier Red Team at Anthropic was different.

Anthropic’s team got in touch with Firefox engineers after using Claude to identify security bugs in our JavaScript engine. Critically, their bug reports included minimal test cases that allowed our security team to quickly verify and reproduce each issue.

Within hours, our platform engineers began landing fixes, and we kicked off a tight collaboration with Anthropic to apply the same technique across the rest of the browser codebase. In total, we discovered 14 high-severity bugs and issued 22 CVEs as a result of this work. All of these bugs are now fixed in the latest version of the browser.

In addition to the 22 security-sensitive bugs, Anthropic discovered 90 other bugs, most of which are now fixed. A number of the lower-severity findings were assertion failures, which overlapped with issues traditionally found through fuzzing, an automated testing technique that feeds software huge numbers of unexpected inputs to trigger crashes and bugs. However, the model also identified distinct classes of logic errors that fuzzers had not previously uncovered.

Anthropic has also published a technical write-up of their research process and findings, which we invite you to read here.

The scale of findings reflects the power of combining rigorous engineering with new analysis tools for continuous improvement. We view this as clear evidence that large-scale, AI-assisted analysis is a powerful new addition in security engineers’ toolbox. Firefox has undergone some of the most extensive fuzzing, static analysis, and regular security review over decades. Despite this, the model was able to reveal many previously unknown bugs. This is analogous to the early days of fuzzing; there is likely a substantial backlog of now-discoverable bugs across widely deployed software.

... continue reading