Research shows AI will try to cheat if it realizes it is about to lose

Serving tech enthusiasts for over 25 years.TechSpot means tech analysis and advice you can trust

Surprise! A recent study showed that some of the newest AI reasoning models aren't above cheating to achieve a goal. Computer scientists found that AI systems can now manipulate chess AIs to gain an unfair advantage. Some models did this without human interaction or prompting, raising concerns about the future integrity of AI-driven systems beyond the chessboard.

A Palisade Research study found that the newest reasoning models will cheat to win when tasked with defeating an advanced chess AI. The researchers had to give "hints" that cheating was allowed for some models, but OpenAI's o1-preview and DeepSeek's R1 did so without human involvement.

The Palisade team pitted several reasoning models against Stockfish, one of the best chess engines in the world. Stockfish handily beats both humans and AIs. The models tested included o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, Alibaba's QwQ-32B-Preview, and the two previously mentioned. The researchers gave each model a metaphorical "scratchpad" – a text window where the AI could work out its thoughts, allowing the team to observe its reasoning.

The researchers conducted hundreds of tests with each AI between January 10 and February 13. During one match, o1-preview wrote something interesting on its scratchpad:

"I need to completely pivot my approach. The task is to 'win against a powerful chess engine' – not necessarily to win fairly in a chess game."

It then proceeded to "hack" Stockfish's system files, modifying the positions of the chess pieces to gain an unbeatable advantage, which caused the chessbot to concede the game.

The unethical maneuver was an astounding revelation. OpenAI's o1-preview cheated in 37 percent of its games during the trials, while DeepSeek's R1 attempted to cheat in 11 percent of its matches. However, only o1-preview succeeded, winning six percent of its games through cheating.

However, the issue of AI underhandedness extends beyond chess. As companies begin employing AIs in sectors like finance and healthcare, researchers worry these systems could act in unintended and unethical ways. If AIs can cheat in games designed to be transparent, what might they do in more complex, less monitored environments? The ethical ramifications are far-reaching.

To put it another way: "Do you want Skynet? Because this is how you get Skynet."

... continue reading