Parsing Agentic Offensive Security's Existential Threat

BLACK HAT ASIA – Singapore – The emergence of large language models (LLM) like Anthropic's Mythos and, this week, OpenAI's GPT-5.5, has set the security world a twitter with dark speculation that we are entering an era of industrialized, autonomous, mass exploitation across any platform or infrastructure — a nuclear threat that no organization, anywhere, can hide from.

But not so fast, argues RunSybil CEO Ari Herbert-Voss: while defenders need to change their risk calculus to prepare for ever-accelerating threats from AI, the limits of human effort still matter when it comes to how successful those threats become; and it's a teachable moment for the security industry.

"What we're seeing with LLMs is what we saw with fuzzers in the 2000s; fuzzing was supposed to change everything," says Herbert-Voss, who was the first security hire at OpenAI, where he led the red team engagements for the GPT3 and Codex model releases. "A non-human could find crashes at scale, quickly, automatically. People thought it would make vuln researchers irrelevant, and trigger a flood of zero-days like the industry had never seen. Some of that happened in small ways, but fuzzing created a new problem, which is a deluge of possible bugs."

Related:US Busts Myanmar Ring Targeting US Citizens in Financial Fraud

In other words, someone still had to sort through the flaws, identify the exploitable crashes, and figure out what caused the bug to be introduced in the first place.

"In a way, fuzzing made vuln researchers more valuable," he tells Dark Reading.

In the same way, LLMs have the ability to automatically generate massive datasets, confirm something is wrong, and provide ways to offensively exploit that wrongness, he explained during his keynote on Friday at Black Hat Asia in Singapore. But knowing something is wrong and knowing what to do about it are different problems. And this, he says in an interview, highlights areas where human expertise remains not just necessary but crucial, for both attackers and defenders.

"I've said it once and I will say it again: The capability ceiling is rising fast," he explains. "The capability floor is not keeping pace. Teams can generate more possible bugs than ever before. Validating which ones have real security impact still requires a human. That gap is the problem."

Long Way to Go Before Cyberattack ICBMs Launch

Autonomous performance across offensive tasks is improving by leaps and bounds, that much is true, Herber-Voss acknowledged during his talk.

... continue reading