AI systems now match and exceed humans on many tasks, but behave through measurably different cognitive processes. This gap can be exploited to detect AI agents and online bots.
This is a ~1000 word overview of our recent machine learning conference paper submission. To read the full preprint, click here.
"CAPTCHAs are broken these days." AI can easily identify all the traffic lights in a static grid. So CAPTCHAs don't provide a valuable human signal, right?
Yes and no.
Yes, because vision language models (VLMs) can recognize images like chimneys, fire hydrants, and traffic lights. Deep learning "solved" CAPTCHA-style image classification in the early 2010s.
No, because AI does not complete CAPTCHAs like humans. If you look across all the data of humans and AI completing CAPTCHAs, you start noticing differences in features like error patterns. Our recent paper found statistically significant differences across sequential click patterns, direction changes, and overselection behavior - features that define how a participant, agent or human, would solve the CAPTCHA problem. In other words, AI can solve CAPTCHAs, but they don't solve them like humans.
Figure 1: Humans and Claude/GPT/Gemini perform at similar task performance levels on the classic CAPTCHA, but there are statistically significant process differences across features like sequential score, direction change, and overselection.
The Turing Test - originally proposed in 1950 by Alan Turing - offers a simple criterion for machine intelligence. If a judge cannot reliably distinguish a machine's responses from a human's, the machine can be considered intelligent.
Turing understood this behavioral criterion was a concession and not the end-all-be-all of human vs. machine intelligence. He had to concede: the question is too difficult, abstract, and loaded. Behavioral indistinguishability provided a more tractable condition, and one that seemed like a good North Star in the 1950s.
Following Turing's footsteps of defining an adversarially robust discriminator that can separate humans from bots, we designed CogCAPTCHA30. This goes one level deeper than the Turing Test, from exploring output (what humans and agents can do) to process (how it can do it). CogCAPTCHA30 combines the original CAPTCHA with 29 classic cognitive psychology tasks for a 30-task battery.
... continue reading