Can We Trust AI To Write Vulnerability Checks? Here's What We Found

Vulnerability management is always a race. Attackers move quickly, scans take time, and if your scanner can’t keep up, you’re left exposed.

That’s why Intruder’s security team kicked off a research project: could AI help us build new vulnerability checks faster, without dropping our high standards for quality?

After all, speed is only useful if the detections are solid - a check that fires false positives (or worse, misses real issues) doesn’t help anyone.

In this post, we’ll share how we’ve been experimenting with AI, what’s working well, and where it falls short.

One-shot vs. Agentic Approach

We started simple: drop prompts into an LLM chatbot and see if it could write Nuclei templates. The results were messy. Outputs referenced features that didn’t exist, spat out invalid syntax, and used weak matchers and extractors. This was consistent across ChatGPT, Claude, and Gemini.

So we tried an agentic approach. Unlike a chatbot, an agent can use tools, search reference material, and follow rules. We went in with healthy skepticism (recent “vibe coding” disasters didn’t inspire confidence), but the improvement was immediate.

We used Cursor’s agent, and very quickly saw that with minimal prompts, the quality of output from initial runs was far more promising.

From there, we layered on rules and indexed a curated repo of Nuclei templates. This gave the agent solid examples to learn from, cut down inconsistencies, and nudged it towards using the right functionality. The quality of templates jumped noticeably and were far closer to what we’d expect from our engineers.

But it wasn’t set-and-forget. Left alone, the agent still needed course corrections. With clear prompting, though, it could generate checks that looked like they’d been written manually.

... continue reading