Are We Training AI Too Late?

QUESTION: Are we training AI too late?

Nishawn Smagh, Director of Intelligence at GreyNoise: Artificial intelligence anchors modern security operations. Detection models are typically trained on labeled breach logs, malware samples, threat feeds, and post-incident investigations; sources that provide validated ground truth and enable reliable classification.

But these sources share a critical structural limitation: They reflect attacker behavior only after malicious activity has already been confirmed.

The central question becomes whether we are training AI to recognize impact or intent. For the answer, let's look at IP patterns associated with malicious scanning activity.

The Fresh Infrastructure Problem

Internet-scale telemetry shows that high-impact exploitation frequently originates from infrastructure with little or no prior malicious history. According to GreyNoise's 2026 State of the Edge report:

Related:As Cybersecurity Firms Chase AI, VC Market Skyrockets

52% of remote code execution (RCE) exploitation traffic originated from IPs that had not appeared in common threat feeds.

38% of authentication bypass attempts involved previously unseen IPs.

For basic reconnaissance (e.g., information disclosure), the number of IPs with no scanning history drops to 29%.

... continue reading