When people want a clear-eyed take on the state of artificial intelligence and what it all means, they tend to turn to Melanie Mitchell, a computer scientist and a professor at the Santa Fe Institute. Her 2019 book, Artificial Intelligence: A Guide for Thinking Humans, helped define the modern conversation about what today’s AI systems can and can’t do.
Melanie Mitchell
Today at NeurIPS, the year’s biggest gathering of AI professionals, she gave a keynote titled “On the Science of ‘Alien Intelligences’: Evaluating Cognitive Capabilities in Babies, Animals, and AI.” Ahead of the talk, she spoke with IEEE Spectrum about its themes: Why today’s AI systems should be studied more like nonverbal minds, what developmental and comparative psychology can teach AI researchers, and how better experimental methods could reshape the way we measure machine cognition.
You use the phrase “alien intelligences” for both AI and biological minds like babies and animals. What do you mean by that?
Melanie Mitchell: Hopefully you noticed the quotation marks around “alien intelligences.” I’m quoting from a paper by [the neural network pioneer] Terrence Sejnowski where he talks about ChatGPT as being like a space alien that can communicate with us and seems intelligent. And then there’s another paper by the developmental psychologist Michael Frank who plays on that theme and says, we in developmental psychology study alien intelligences, namely babies. And we have some methods that we think may be helpful in analyzing AI intelligence. So that’s what I’m playing on.
When people talk about evaluating intelligence in AI, what kind of intelligence are they trying to measure? Reasoning or abstraction or world modeling or something else?
Mitchell: All of the above. People mean different things when they use the word intelligence, and intelligence itself has all these different dimensions, as you say. So, I used the term cognitive capabilities, which is a little bit more specific. I’m looking at how different cognitive capabilities are evaluated in developmental and comparative psychology and trying to apply some principles from those fields to AI.
Current Challenges in Evaluating AI Cognition
You say that the field of AI lacks good experimental protocols for evaluating cognition. What does AI evaluation look like today?
Mitchell: The typical way to evaluate an AI system is to have some set of benchmarks, and to run your system on those benchmark tasks and report the accuracy. But often it turns out that even though these AI systems we have now are just killing it on benchmarks, they’re surpassing humans, that performance doesn’t often translate to performance in the real world. If an AI system aces the bar exam, that doesn’t mean it’s going to be a good lawyer in the real world. Often the machines are doing well on those particular questions but can’t generalize very well. Also, tests that are designed to assess humans make assumptions that aren’t necessarily relevant or correct for AI systems, about things like how well a system is able to memorize.
... continue reading