The Timmy Trap

This is Part 2 of my LLM series. In Part 1, I discussed how in just a few short years, we went from the childlike joy of creating “Pirate Poetry” to the despair that our jobs would disappear. My main message was to relax a bit, as companies abuse the hype cycle to distort what is actually happening. In this post I want to talk about how we fall prey to this distortion: we perceive LLMs as intelligent when they aren’t. A recent post from Jeppe Stricker put me on this path. He wrote, “AI produces fluent coherency, texts that follow the rules of argument and structures so well, it bypasses skepticism altogether.” He’s right. LLMs are language models, and their superpower is fluency. It’s this fluency that hacks our brains, trapping us into seeing them as something they aren’t. This is best seen when people claim that ChatGPT has “passed the Turing test.” The circular argument is that if we can’t tell an LLM from a human, it is effectively human. This just makes me shake my head, as it profoundly misunderstands what is happening. First, let me be a bit pedantic. The original Turing Test was designed to compare two participants chatting through a text-only interface: one AI and one human. The goal was to spot the imposter. Today, the test is simplified from three participants to just two: a human and an LLM. This changes the test from a comparison to a judgment. The problem? We really, really, really want to find the humanity in almost anything. This is a well-studied tendency called anthropomorphism, and this one-on-one test is basically setting us up to be hacked. This is why Stricker’s quote is so important. Since LLMs are trained to recombine text written by other humans, it bypasses our skepticism. Back in the 1960s, Joseph Weizenbaum created a human-mimicking chatbot called ELIZA. It used no “AI”; it just relied on a long list of if-then-else clauses that recreated the questioning patterns of a Rogerian psychologist. The program was shockingly effective at convincing users they were talking to a real person. In fact, it was so effective that ELIZA actually outperformed ChatGPT 3.5. So, what does it say about the power of LLMs when they can be beaten by a simple program from the 1960s? Computers appear to be beating the Turing test but what’s actually happening is we are failing it as judges! We are so prone to anthropomorphizing that we desperately want to believe the machine is human. This isn’t a flaw in humans, but a strength; it’s how we build community and reinforce social bonds. However, this strength can be hacked. After watching his creation fool so many people, Weizenbaum prophetically observed, “ELIZA shows how easy it is to create and maintain the illusion of understanding. A certain danger lurks here.” Enter Timmy When I speak on this topic, I bring out a standard yellow pencil with googly eyes stuck near the eraser end and a pipe cleaner wrapped around it for arms. I call him Timmy and, animating him like a puppet, have him say “hello” to the audience. Of course, they all say hello back. Timmy then describes how much he likes to draw with children and make them laugh. I ask what he wants to be when he grows up and he says, “To be A UX designer, just like you.” I reply, “Aww, that’s really too bad, Timmy.” Then, I hold him up horizontally in front of my face and abruptly snap him in half. The audience gasps. It’s a shocking moment, and I’ve been told by many, it’s the most memorable part of the talk. The reason is simple: they felt a connection to Timmy. They had known him for only 15 seconds, yet they still perceived the act of snapping him in half as violent. That’s why LLMs can fool us so easily. If we can form a human connection with a pencil in just 15 seconds, imagine how we’ll feel about an “AI system” we spend an hour with. We want them to be human. This is why we call their frequent mistakes “hallucinations,” a term that implies a temporary lapse. But it’s not a lapse; it’s a fundamental lack of human cognition. We want connection to these systems. We want to see ourselves in them. We want to make excuses for them. This makes us beautifully human, but unreliable judges. Summary vs Shortening We don’t just treat LLMs like they’re alive; we also see their actions as intelligent. For instance, we say they can “summarize” a document. But LLMs don’t summarize, they shorten, and this is a critical distinction. A true summary, the kind a human makes, requires outside context and reference points. Shortening just reworks the information already in the text. Here is an example using the movie The Matrix: Summary A philosophical exploration of free will and reality disguised as a sci-fi action film about breaking free from systems of control. Shortening A computer hacker finds out reality is fake and learns Kung Fu. There’s a key difference between summarizing and simply shortening. A summary enriches a text by providing context and external concepts, creating a broader framework for understanding. Shortening, in contrast, only reduces the original text; it removes information without adding any new perspective. Now, I have to admit something: this entire movie example came from ChatGPT. I asked it about the summarization/shortening issue with LLMs, and it agreed with me (so clearly it must be true!). When I asked for examples, it suggested the Matrix and even gave me the “Summary” and “Shortening” text, which I then used here word for word. But wait, isn’t that self-defeating? How could ChatGPT comment on itself like this if it didn’t fully understand? Surely that implies it must have some level of intelligence? The explanation is simple: we almost always misunderstand a new technology, thinking it’s doing much more than it actually is. The exact same thing happened in the 1990s when IBM’s Deep Blue beat Kasparov in chess. People assumed it was intelligent and that computers would soon surpass humanity. However, Deep Blue wasn’t intelligent. It simply predicted the next move by brute force, using an exhaustive search to find the best option. This created an illusion of intelligence because only really smart humans can play chess at that level. LLMs operate in a similar way, trading what we would call intelligence for a vast memory of nearly everything humans have ever written. It’s nearly impossible to grasp how much context this gives them to play with. ChatGPT didn’t summarize The Matrix; it shortened the commentaries other people wrote about it online. In the same way, when I asked about the issues with LLMs shortening instead of summarizing, it just collected and shortened other articles on that topic. It’s just a more serious version of Pirate Poetry. This is why LLMs appear to summarize well-known books, papers, and movies so well. They aren’t summarizing the source material. Instead, they are synthesizing an answer from hundreds of articles written by other humans. But this is why they perform so poorly when summarizing unknown or academic PDFs. With no web articles for support, an LLM can ONLY look at the text within the document itself, which results in the equivalent of “a computer hacker finds out reality is fake and learns kung fu.” This crucial difference between genuine summarizing and mechanical shortening isn’t just a semantic game. It exposes our confusion. When we mistake shortening for summarizing, we are making a fundamental error how we think about intelligence. What is intelligence? We throw the term around far too glibly. Many in the tech space think of intelligence as a simple collection of facts: ask a question, get an answer, pass a test. If an LLM gets really fancy, it might break a problem up, ask different sub-experts, and collate the replies. It’s information retrieval all the way down. For decades, intelligence has been debated by psychologists, philosophers, sociologists, and anthropologists. It’s a slippery topic and there still is no clear answer but they all come to roughly the same insight: intelligence is far from being a universal property. It is just as much based on cultural context, language, and social factors. Intelligence is not a solo act but a social one, rooted in shared beliefs and values. To think is to be social. Intelligence is what we do collectively. The mistake people make in assuming an LLM will have general intelligence isn’t that they think too highly of the tech; it’s that they completely misunderstand what it means to be human. This helps explain why ELIZA ‘beat’ ChatGPT in the Turing test. By mimicking a psychologist, it drew from a shared set of social conventions that we interpret as being more “intelligent.” This is what makes discussing LLMs so difficult. Just as we are horrible judges in the Turing test, we are also horrible at understanding intelligence itself. We throw the word around with little real understanding. And, as I wrote in my post about hype, this lack of precision prevents us from seeing LLMs clearly. Escaping the Timmy Trap LLMs mimic intelligence, but they aren’t intelligent. Just as when Deep Blue beat Kasparov, we are misunderstanding how the tech works. This doesn’t mean we can’t do amazing things with LLMs. They are very powerful tools, but we won’t properly unlock their potential until we understand what they’re good at and, more importantly, what they aren’t. We have to learn to see the pencil and not get distracted by the googly eyes. Escaping the Timmy Trap means recognizing fluent mimicry as an impressive technical feat, not a kindred spirit. This insight helps us lean into what they do well and avoid wasting time on what they can’t. I’ll discuss that topic in Part 3.

Share this article

Related Articles