Chatbots are genuinely impressive when you watch them do things they're good at, like writing a basic email or creating weird futuristic-looking images. But ask generative AI to solve one of those puzzles in the back of a newspaper, and things can quickly go off the rails.
That's what researchers at the University of Colorado Boulder found when they challenged large language models to solve Sudoku. And not even the standard 9x9 puzzles. An easier 6x6 puzzle was often beyond the capabilities of an LLM without outside help (in this case, specific puzzle-solving tools).
A more important finding came when the models were asked to show their work. For the most part, they couldn't. Sometimes they lied. Sometimes they explained things in ways that made no sense. Sometimes they hallucinated and started talking about the weather.
If gen AI tools can't explain their decisions accurately or transparently, that should cause us to be cautious as we give these things more control over our lives and decisions, said Ashutosh Trivedi, a computer science professor at the University of Colorado at Boulder and one of the authors of the paper published in July in the Findings of the Association for Computational Linguistics.
"We would really like those explanations to be transparent and be reflective of why AI made that decision, and not AI trying to manipulate the human by providing an explanation that a human might like," Trivedi said.
When you make a decision, you can try to justify it, or at least explain how you arrived at it. An AI model may not be able to accurately or transparently do the same. Would you trust it?
Watch this: Telsa Found Liable for Autopilot accident, Tariffs Start to Impact Prices & More | Tech Today 03:08
Why LLMs struggle with Sudoku
We've seen AI models fail at basic games and puzzles before. OpenAI's ChatGPT (among others) has been totally crushed at chess by the computer opponent in a 1979 Atari game. A recent research paper from Apple found that models can struggle with other puzzles, like the Tower of Hanoi.
It has to do with the way LLMs work and fill in gaps in information. These models try to complete those gaps based on what happens in similar cases in their training data or other things they've seen in the past. With a Sudoku, the question is one of logic. The AI might try to fill each gap in order, based on what seems like a reasonable answer, but to solve it properly, it instead has to look at the entire picture and find a logical order that changes from puzzle to puzzle.
... continue reading