Hardening Firefox with Claude Mythos Preview

Two weeks ago we announced that we had identified and fixed an unprecedented number of latent security bugs in Firefox with the help of Claude Mythos Preview and other AI models. In this post, we’ll go into more detail about how we approached this work, what we found, and advice for other projects on making good use of emerging capabilities to harden themselves against attack.

Suddenly, the bugs are very good

Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap and easy to prompt an LLM to find a “problem” in code, but slow and expensive to respond to it.

It is difficult to overstate how much this dynamic changed for us over a few short months. This was due to a combination of two main factors. First, the models got a lot more capable. Second, we dramatically improved our techniques for harnessing these models — steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise.

Ordinarily we keep detailed bug reports private for several months after shipping fixes and issuing security advisories, largely as a precaution to protect any users who, for whatever reason, were slow to update to the latest version of Firefox. Given the extraordinary level of interest in this topic and the urgency of action needed throughout the software ecosystem, we’ve made the calculated decision to unhide a small sample of the reports behind the fixes we recently shipped. We’ve attempted to draw them from a range of browser subsystems, but the selection process was still somewhat arbitrary. Nevertheless, we hope that the depth and diversity of these reports lends credence to our assessment of the capabilities and our calls for defenders to begin applying these techniques:

Bug ID Description 2024918 An incorrect equality check can cause the JIT to optimize away the initialization of a live WebAssembly GC struct, creating a fake-object primitive with potential arbitrary read/write in code that had undergone extensive fuzzing by internal and external researchers. 2024437 A 15-year-old bug in the <legend> element triggered by meticulous orchestration of edge cases across distant parts of the browser, including recursion stack depth limits, expando properties, and cycle collection. 2021894 Reliably exploits a race condition over IPC, allowing a compromised content process to manipulate IndexedDB refcounts in the parent to trigger a UAF and potential sandbox escape. 2022034 A raw NaN crossing an IPC boundary can masquerade as a tagged JS object pointer, turning double deserialization into a parent-process fake-object primitive for a sandbox escape. 2024653 An intricate testcase weaving through nested event loops, pagehide listeners, and garbage collection to trigger a UAF in the attribute setter for <object> elements. 2022733 Triggers a parent UAF by flooding WebTransport with thousands of certificate hashes to stretch a race condition in a refcount-heavy copy loop, and exploits that race condition over IPC from a compromised content process. 2023958 Simulates a malicious DNS server by intercepting glibc DNS function calls in order to reproduce a UDP->TCP fallback edge case, triggering a buffer over-read and parent-process stack memory leak during HTTPS RR & ECH parsing. 2025977 20-year-old XSLT bug in which reentrant key() calls cause a hash table rehash that frees its backing store while a raw entry pointer is still in use (one of several sec-high issues we fixed involving XSLT). 2027298 Patches the color picker to simulate otherwise non-automatable user selection, then uses a synchronous input event to spin a nested event loop that re-enters actor teardown and frees the callback while it is still unwinding, triggering a content process UAF. 2023817 A compromised content process could send an arbitrary wallpaper image to be decoded in the parent process, which could be paired with a hypothetical vulnerability in an image decoder to escape the sandbox. This entailed difficult-to-automate reasoning about the trust-level of inputs in the parent process. 2029813 Escapes our in-process sandboxing technology for third-party libraries (RLBox) by leveraging a gap in the verification logic used to copy values from the untrusted to the trusted side of the sandbox boundary. 2026305 Extremely small testcase that exploits the special rowspan=0 semantics in HTML tables by appending >65535 rows to bypass clamping and overflow a 16-bit layout bitfield, which went undetected for years by fuzzers.

Note that a number of these bugs are sandbox escapes, which would need to be combined with other exploits to achieve a full-chain Firefox compromise. These reports presume that the sandboxed process that renders site content has already been compromised with some separate bug, and is now running attacker-controlled machine code attempting to escalate control into the privileged parent process. When crafting a sandbox escape, the model is permitted to patch the Firefox source code, so long as the modified code is restricted to run only in the sandboxed process[1]. Such bugs are notoriously difficult to find with fuzzing, and while we’ve had some success developing new techniques to close this gap, AI analysis provides much more comprehensive coverage of this critical surface.

Just as interesting as what the models found is what they didn’t find — not because they didn’t try, but because they were unable to circumvent Firefox’s layered defenses. For example, in recent years we received several clever reports from security researchers that managed to escape the process sandbox by triggering prototype pollution in the privileged parent process. Rather than fixing these problems one-by-one, we made an architectural change to freeze these prototypes by default. While auditing logs from the harness, we saw many attempts to pursue this line of escape that were thwarted by this design. Observing such direct payoff from previous hardening work was even more rewarding than finding and fixing more bugs.

Harnessing Models to Build a Hardening Pipeline

We’ve experimented internally with LLM code audits over the past few years, with early attempts using models like GPT 4 or Sonnet 3.5 to statically analyze high risk code for vulnerabilities. These experiments showed some promise, but the high rate of false positives made them impractical to scale.

... continue reading