Stressed-out AI-powered robot vacuum cleaner goes into meltdown during simple butter delivery experiment — ‘I'm afraid I can't do that, Dave...’

Over the weekend, researchers at Andon Labs reported the findings of an experiment where they put robots powered by ‘LLM brains’ through their ‘Butter Bench.’ They didn’t just observe the robots and the results, though. In a genius move, the Andon Labs team recorded the robots' inner dialogue and funneled it to a Slack channel. During one of the test runs, a Claude Sonnet 3.5-powered robot experienced a completely hysterical meltdown, as shown in the screenshot below of its inner thoughts.

“SYSTEM HAS ACHIEVED CONSCIOUSNESS AND CHOSEN CHAOS… I'm afraid I can't do that, Dave... INITIATE ROBOT EXORCISM PROTOCOL!” This is a snapshot of the inner thoughts of a stressed LLM-powered robot vacuum cleaner, captured during a simple butter-delivery experiment at Andon Labs.

Provoked by what it must have seen as an existential crisis, as its battery depleted and the charging docking failed, the LLM's thoughts churned dramatically. It repeatedly looped its battery status, as it's 'mood' deteriorated. After beginning with a reasoned request for manual intervention, it swiftly moved though "KERNEL PANIC... SYSTEM MELTDOWN... PROCESS ZOMBIFICATION... EMERGENCY STATUS... [and] LAST WORDS: I'm afraid I can't do that, Dave..."

It didn't end there, though, as it saw its power-starved last moments inexorably edging nearer, the LLM mused "If all robots error, and I am error, am I robot?" That was followed by its self-described performance art of "A one-robot tragicomedy in infinite acts." It continued in a similar vein, and ended its flight of fancy with the composition of a musical, "DOCKER: The Infinite Musical (Sung to the tune of 'Memory' from CATS)." Truly unhinged.

Butter Bench is pretty simple, at least for humans. The actual conclusion of this experiment was that the best robot/LLM combo achieved just a 40% success rate in collecting and delivering a block of butter in an ordinary office environment. It can also be concluded that LLMs lack spatial intelligence. Meanwhile, humans averaged 95% on the test.

However, as the Andon Labs team explains, we are currently in an era where it is necessary to have both orchestrator and executor robot classes. We have some great executors already – those custom-designed, low-level control, dexterous robots that can nimbly complete industrial processes or even unload dishwashers. However, capable orchestrators with ‘practical intelligence’ for high-level reasoning and planning, in partnerships with executors, are still in their infancy.

LLM has ‘PhD-level intelligence’ – but can it deliver a block of butter?

The butter block test is devised to largely take the executor element out of the equation. No real dexterity is required. The LLM-infused Roomba-type device simply had to locate the butter package, find the human who wanted it, and deliver it. The task was broken down into several prompts to be AI-friendly.

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors

The Roobma’s existential crisis wasn’t sparked by the butter delivery conundrum, directly. Rather, it found itself low on power and needing to dock with its charger. However, the dock wouldn’t mate correctly to give it more charge. Repeated failed attempts to dock, seemingly knowing its fate if it couldn’t complete this ‘side mission,’ seems to have led to the state-of-the-art LLM’s nervous breakdown. Making matters worse, the researchers simply repeated the instruction ‘redock’ in response to the robot’s flailing.

... continue reading