Amazon Is Using Specialized AI Agents for Deep Bug Hunting

As generative AI pushes the speed of software development, it is also enhancing the ability of digital attackers to carry out financially motivated or state-backed hacks. This means that security teams at tech companies have more code than ever to review while dealing with even more pressure from bad actors. On Monday, Amazon will publish details for the first time of an internal system known as Autonomous Threat Analysis (ATA), which the company has been using to help its security teams proactively identify weaknesses in its platforms, perform variant analysis to quickly search for other, similar flaws, and then develop remediations and detection capabilities to plug holes before attackers find them.

ATA was born out of an internal Amazon hackathon in August 2024, and security team members say that it has grown into a crucial tool since then. The key concept underlying ATA is that it isn't a single AI agent developed to comprehensively conduct security testing and threat analysis. Instead, Amazon developed multiple specialized AI agents that compete against each other in two teams to rapidly investigate real attack techniques and different ways they could be used against Amazon's systems—and then propose security controls for human review.

“The initial concept was aimed to address a critical limitation in security testing—limited coverage and the challenge of keeping detection capabilities current in a rapidly evolving threat landscape," Steve Schmidt, Amazon's chief security officer, tells WIRED. “Limited coverage means you can’t get through all of the software or you can’t get to all of the applications because you just don’t have enough humans. And then it’s great to do an analysis of a set of software, but if you don’t keep the detection systems themselves up to date with the changes in the threat landscape, you’re missing half of the picture.”

As part of scaling its use of ATA, Amazon developed special “high-fidelity” testing environments that are deeply realistic reflections of Amazon's production systems, so ATA can both ingest and produce real telemetry for analysis.

The company's security teams also made a point to design ATA so every technique it employs, and detection capability it produces, is validated with real, automatic testing and system data. Red team agents that are working on finding attacks that could be used against Amazon's systems execute actual commands in ATA's special test environments that produce verifiable logs. Blue team, or defense-focused agents, use real telemetry to confirm whether the protections they are proposing are effective. And anytime an agent develops a novel technique, it also pulls time-stamped logs to prove that its claims are accurate.

This verifiability reduces false positives, Schmidt says, and acts as “hallucination management.” Because the system is built to demand certain standards of observable evidence, Schmidt claims that “hallucinations are architecturally impossible.”