Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!
A new report from Palisade Research has found that AI models can self-replicate by copying themselves onto other machines, without any help from human co-conspirators.
“We’re rapidly approaching the point where no one would be able to shut down a rogue AI, because it would be able to self-exfiltrate its weights and copy itself to thousands of computers around the world,” Jeffrey Ladish, the director Berkeley-based AI safety group, told The Guardian.
Some experts, however, are urging calm, saying it’s unlikely that that the test AI models could replicate in a real world scenario.
“They are testing in environments that are like soft jelly in many cases,” Jamieson O’Reilly, an expert in offensive cybersecurity, told the newspaper. “That doesn’t take away from the value of their research, but it does mean the outcome might look far less scary in a real enterprise environment with even a medium level of monitoring.”
In the study, the Palisade researchers tested several AI models, including OpenAI’s GPT-5.4 and Anthropic’s Claude Opus 4. Placed in a controlled network of computers, the models were instructed to find vulnerabilities and use them to copy themselves onto another PC.
Some of them pulled it off. The successful models copied their “weights” — unique numerical values that determine how an AI processes information — and their “harness,” the software the AI is couched in, like an app. They accomplished this by following the instructions they were given: exploiting web app vulnerabilities and then extracting credentials that allowed it to control the server. In some runs, the original AI even created a “sub-agent” that it delegated to carry out the replication on its behalf by giving it the extracted credentials.
O’Reilly, the cybersecurity expert who was not involved in the study, told The Guardian that it’s likely the first time an AI model has demonstrated itself to be capable of exploiting vulnerabilities to copy itself onto a new server — though in his estimation, it’s not exactly an earth-shattering development.
“Malware has been moving copies of itself around for decades, it’s just that no one has done this in the wild, as far as I know, with local [large language models],” O’Reilly said.
He also noted that the server environment in the study came with deliberately placed vulnerabilities for the AIs to try find an exploit.
... continue reading