DeepMind AI safety report explores the perils of “misaligned” AI

Generative AI models are far from perfect, but that hasn't stopped businesses and even governments from giving these robots important tasks. But what happens when AI goes bad? Researchers at Google DeepMind spend a lot of time thinking about how generative AI systems can become threats, detailing it all in the company's Frontier Safety Framework. DeepMind recently released version 3.0 of the framework to explore more ways AI could go off the rails, including the possibility that models could ignore user attempts to shut them down. DeepMind's safety framework is based on so-called "critical capability levels" (CCLs). These are essentially risk assessment rubrics that aim to measure an AI model's capabilities and define the point at which its behavior becomes dangerous in areas like cybersecurity or biosciences. The document also details the ways developers can address the CCLs DeepMind identifies in their own models. Google and other firms that have delved deeply into generative AI employ a number of techniques to prevent AI from acting maliciously. Although calling an AI "malicious" lends it intentionality that fancy estimation architectures don't have. What we're talking about here is the possibility of misuse or malfunction that is baked into the nature of generative AI systems. The updated framework (PDF) says that developers should take precautions to ensure model security. Specifically, it calls for proper safeguarding of model weights for more powerful AI systems. The researchers fear that exfiltration of model weights would give bad actors the chance to disable the guardrails that have been designed to prevent malicious behavior. This could lead to CCLs like a bot that creates more effective malware or assists in designing biological weapons. DeepMind also calls out the possibility that an AI could be tuned to be manipulative and systematically change people's beliefs—this CCL seems pretty plausible given how people grow attached to chatbots. However, the team doesn't have a great answer here, noting that this is a "low-velocity" threat, and our existing "social defenses" should be enough to do the job without new restrictions that could stymie innovation. This might assume too much of people, though.

DeepMind AI safety report explores the perils of “misaligned” AI

Share this article

Related Articles