Tech News
← Back to articles

How to stop the survey-taking AI chatbots that threaten to upend social science

read original related products more articles

Surveys are a cornerstone of social-science research. Over the past two decades, online recruitment platforms — such as Amazon Mechanical Turk, Prolific, Cloud Research’s Prime Panels and Cint’s Lucid — have become essential tools for helping researchers to reach large numbers of survey participants quickly and cheaply.

There have long been concerns, however, about inauthentic participation1. Some survey takers rush the task simply to make money. Because they are often paid a fixed amount based on the estimated time taken to complete the survey (typically US$6–12 per hour), the faster they complete the task, the more money they can make.

Studies suggest that between 30% and 90% of responses to social-science surveys can be inauthentic or fraudulent2,3. This problem is exacerbated in studies targeting specialized populations or marginalized communities because the intended participants are harder to reach and are often recruited online, raising the risk of fraud and interference by automated programs called bots4,5. Those percentages are much higher than the amount of fraudulent responses most studies can cope with, if they are to produce results that are statistically valid: even 3–7% of polluted data can distort results, rendering interpretations inaccurate6. And the problem is getting worse.

AI chatbots are infiltrating social-science surveys — and getting better at avoiding detection

A parallel industry has emerged offering scripts, bots and tutorials that (legitimately) make it easy to partially or fully automate form filling (see, for example, go.nature.com/4q8kftd). The use of artificial intelligence for crafting responses is on the rise, too. For instance, answers mediated by large language models (LLMs) accounted for up to 45% of submissions in one 2025 study7. The advent of AI agents that can autonomously interact with websites is set to escalate the problem, because such agents make the production of authentic-looking survey responses trivially easy, even for people without coding experience.

Researchers and survey providers have long developed tools to prevent, deter or detect inauthentic survey responses. CAPTCHA8, for example, tests whether a user is human by requiring them to identify distorted text, sounds or images. Such methods could confuse unsophisticated bots (and inattentive humans), but not AI agents.

A few detection measures can distinguish agent-generated responses from genuine ones by exploiting the way LLMs rely on training data to produce responses and their lack of ability to reason contextually9. For example, LLMs might label an image of a distorted grid or colour-contrast pattern as an optical illusion even after the illusion-inducing elements have been digitally removed, relying on learned associations rather than perception7,10. Humans, by contrast, respond to what they actually see, creating a detectable difference between human and AI interpretations. However, these distinctions are likely to fade as AI advances, rendering such tests unreliable in the near future.

AI-agent detection has been described as a continual game of cat and mouse, in which “the mouse never sleeps”11. Here, we lay out four steps to minimize the risk of survey pollution by AI agents. Using a combination of these detection strategies will probably be necessary to enable researchers to continue to separate out authentic responses from AI bots (see ‘Outwitting AI agents’).

Outwitting AI agents As AI capabilities improve, online survey designers need new detection methods that exploit human–AI differences that are likely to endure, to protect research integrity. See https://osf.io/zudt4 for more details. Detector Mitigation type Core ideas Key limitations Humans as upper bound Human fingerprinting Track a survey taker’s navigation and input patterns (e.g. keystrokes). AI agent can mimic human patterns. Humans as upper bound Factors humans ignore Embed hidden instructions in surveys that humans are likely to overlook, but AI won’t. Can be ignored by AI agent if suspicious. Humans as upper bound Tasks that humans do well but AI agents do poorly Require complex outputs (such as drag-and-drop)or modify standard tests (correct optical illusions). Agent can warn a human user-in-the-loop; tasks will quickly become outdated as AI capability improves. Quality filters Tasks that both humans and agents can do poorly Flag rushed, repeated or low-effort responses. Easily bypassed as AI abilities improve and human weaknesses persist. Humans as lower bound Tasks that humans do poorly but agents do well Exploit human intuition limits using rapid estimation tasks (such as guessing the number of US shopping malls). AI agent can be trained to make deliberate mistakes and imitate human patterns.

Look for response patterns

... continue reading