Forrester: Gen AI is a chaos agent, models are wrong 60% of the time

The shark from Jaws attacked without warning, showing how an apex predator exploits chaos to create lethal, devastating harm on its prey. Now, Forrester says, gen AI has become that predator in the hands of attackers: The one that never tires r sleeps, and executes at scale."In Jaws, the shark acts as the chaos agent," Forrester principal analyst Allie Mellen told attendees at the IT consultancy firm's 2025 Security and Risk Summit. "We have a chaos agent of our own today... And that chaos agent is generative AI."Mellen provided a quantitative, substantial base of research data to back up her claim, illustrating the fundamental weaknesses and unreliability of AI systems. As she emphatically put it: "AI is wrong. It is wrong not just a little bit; it's wrong a lot of the time."Models fail 60% of the time Of the many studies Mellen cited in her keynote, one of the most damning is based on research conducted by the Tow Center for Digital Journalism at Columbia University, which analyzed eight different AI models, including ChatGPT and Gemini. The researchers found that overall, models were wrong 60% of the time; their combined performance led to more failed queries than accurate ones. AI agents frequently fail at real-world tasks Jeff Pollard, VP, principal analyst at Forrester, also drove this home. "Your red teamer is now your AI red team orchestrator," he said. "Traditional pentesting hunts for infrastructure flaws. AI red teaming operates differently. It simulates adversarial attacks on the AI model itself."Pollard also cited several studies, one of the most noteworthy being from Carnegie Mellon researchers who discovered that AI agents fail 70 to 90% of the time on real-world corporate tasks. Nearly half (45%) of AI-generated code contains known OWASP Top 10 vulnerabilities. Exacerbating the risks of gen AI as a chaos agent is how pervasive shadow AI is, with 88% of security leaders admitting to incorporating unauthorized AI into their daily workflows.Forrester's prediction that there will be a $27 billion identity management market surge by 2029 signals how pervasive gen AI is becoming in every identity an organization has to protect, from human-based to machine-created. Gen AI's inherent risks are the chaos agent no one sees coming in cybersecurity. Mellen illustrated the stakes with a concrete example: "AI doesn't necessarily know that sharks don't live on land," she explained, referencing an AI-generated map that placed shark attacks across Wyoming, a landlocked state 1,000 miles from the ocean. "It's all fine and dandy for AI to be wrong when we're just creating a map about shark attacks, but it's an entirely different thing for it to be wrong during a security incident. AI is serving us up a new type of false positive, this time for investigation and response."AI confidently placed shark attacks in Wyoming, 1,000 miles from the ocean. LLMs don't fail quietly. They hallucinate with absolute certainty, then ship to production. Source: 2025 Security & Risk Summit.When 70-90% incompleteness meets production velocityPollard quoted Carnegie Mellon's AgentCompany benchmark, which tested leading AI models against 175 real corporate tasks. Claude 3.5 Sonnet, GPT-4 and specialized enterprise agents all showed systemic patterns of failure. Top performers completed only 24% of tasks autonomously. When researchers added more complexity, failure rates soared to between 70 and 90%. Pollard also led Salesforce's AI Research team, which published equally damning results. CRM-oriented agents failed 62% of baseline enterprise tasks. When researchers applied confidentiality and safety guardrails, accuracy dropped by half, pushing failure rates above 90%. Salesforce detailed these findings at Dreamforce 2024's agentic AI session. Veracode's 2025 GenAI Code Security Report tested 80 coding tasks across four languages (Java, Python, C, JavaScript) and more than 100 LLMs. Results are stark: 45% of AI-generated code introduced OWASP Top 10 vulnerabilities. Language-specific performance varies significantly. Java showed the worst results at 28.5% security pass rate, while Python (55.3%), C (57.3%) and JavaScript (61.7%) performed better. Cross-site scripting and log injection proved catastrophic; only 12 to 13% security pass rates. SQL injection and cryptographic algorithms scored higher,, at 80 to 86%.A key insight from the study is how security performance remained flat despite dramatic syntactic improvements. Newer, larger models generate more compilable code yet introduce vulnerabilities. These findings reflect the impact of training data on coding quality and reliability.Language-by-language security pass rates. Source: Veracode's 2025 GenAI Code Security Report Every new identity creates a new attack surfaceIdentities are the first and favorite target of attackers, and AI's multiplying effect is escalating the risk exponentially. Merritt Maxim, VP and research director at Forrester, delivered a blunt reality check: "Identity security is undergoing the most significant shift since SSO went mainstream. It's not about innovation anymore; it's about containment failure." Maxim further explained: "Entitlements aren't static anymore. We've moved toward zero standing privilege; entitlements are now dynamic, granted just in time." The August 2025 OAuth token breach affecting more than 700 Salesforce customers provided undeniable proof. Geoff Cairns, Forrrester principal analyst, underscored the gravity: "OAuth tokens, API keys, certificates ... these are not configuration artifacts. They're high-value identities. And when you don't govern them, you lose the enterprise."With gen AI expanding identity sprawl, traditional governance collapses at machine speeds. Forrester sees demand for the identity access management (IAM) market growing to $27.5 billion by 2029. The top ten identity security insights reflect machine identities, creating greater complexity and potential chaos that every security professional needs to plan for now.Source: 2025 Security & Risk Summit.Weaponized gen AI is the apex predator stalking enterprise networks Forrester's 2025 Security and Risk Summit didn't merely highlight threats; it delivered a survival blueprint. Weaponized gen AI has become the apex predator within enterprise networks, moving silently, relentlessly and at unprecedented scale. VentureBeat believes the following are essential steps that security and risk management professionals need to take as gen AI becomes a more pervasive threat: Treat AI agents as mission-critical identities and realize that having a clear line on governance of this new class of identity is critical across all areas of the company. Forrester VP and principal analyst Andras Cser explicitly highlighted that "AI agents sit somewhere between machines and human identities; high volume, high autonomy, high impact. Legacy IAM tools cannot govern them effectively." Specialized governance platforms are essential, as they can deliver real-time visibility, adaptive monitoring and dynamic authorization specifically for AI agent identities.Place a high priority on developing and growing AI red team capability. Pollard warned: "Infrastructure flaws matter, but AI model flaws are what will break you. Traditional pentesting has become obsolete. AI red teams must proactively detect and mitigate AI-specific vulnerabilities, including prompt injection, bias exploitation, model inversion and cascading failures from autonomous agents.Operate under the explicit assumption of AI failure. Forrester aimed to deliver an emphatic message of how unreliable gen AI is. They succeeded. Mellen's keynote drove that point home. AI is "serving us new false positives, especially during investigations and responses," Mellen noted. With proven failure rates around 60%, organizations must operate under the explicit assumption that AI systems will regularly fail.Design and implement security controls so they can quickly scale to machine speed. Maxim stated: "Entitlements aren't static anymore. We've moved toward zero standing privilege; entitlements are now dynamic, granted just in time." Traditional, human-paced controls are inadequate against gen AI's velocity.Ruthlessly eliminate blind trust in automation and any legacy infrastructure that is based on assumed trust. Carnegie Mellon's AgentCompany benchmark explicitly revealed catastrophic AI agent failure rates (70–90%) among top-tier models. In one of the strongest statements of the event, Pollard expressly warned: "Guardrails don't make agents safe; they make them fail silently." Organizations must continuously verify, audit and challenge automated systems without compromise. Blind trust in automation is a disaster waiting to happen, and assuming trust relationships with legacy systems is equally bad. Both are potential breaches just waiting on an attacker to find the weakness and exploit it.