This AI mimics humans so perfectly it can corrupt public opinion polls and surveys

In brief: Public opinion polls and surveys have joined the long list of things that can be manipulated by AI. New research has shown that the technology is able to mimic humans almost perfectly, evading the usual protections put in place to identify bots.

A Dartmouth University study published in the Proceedings of the National Academy of Sciences shows how advanced AI can corrupt polls and surveys at a scale large enough to manipulate results.

An LLM-powered autonomous synthetic respondent, created by Sean Westwood – associate professor of government at Dartmouth and the paper's author – was able to evade online surveys' bot detection systems.

Using a 500-word prompt, the AI can create a persona based on generated demographics such as age, gender, race, education, income, and state of residence.

To evade automated detection, the bot simulates realistic reading times based on the generated character's education level, performs human-like mouse movements, and types open-ended responses keystroke-by-keystroke – it even makes realistic typos and corrections.

Across more than 43,000 tests, the AI convinced 99.8% of systems that it was human. It made no errors on logic puzzles and reCAPTCHA tests, and was able to avoid "reverse shibboleth" questions, which present questions or tasks that an AI could complete easily but a human could not.

According to the press release, it would have taken just 10 to 52 fake AI responses to flip the outcome of the seven major national polls before the 2024 election. Deploying the automated responses would have cost as little as 5 cents each – people are usually paid around $1.50 for completing a survey.

Concerningly, the bots were able to produce flawless English answers even when programmed in Russian, Mandarin, or Korean, highlighting how easily foreign threat actors could manipulate survey or poll results.

The AI agent is a model-agnostic program built in Python, so it can be deployed with APIs from AI firms or hosted locally with open-weight models like Llama. OpenAI's o4-mini was mostly used in testing, with DeepSeek R1, Mistral Large, Claude 3.7 Sonnet, Grok3, Gemini 2.5 Preview, and others used for some tasks.

The AI could also have a huge impact on scientific studies, which rely on survey research gathered from online collection platforms.

... continue reading