Why AI Chatbots Agree With You Even When You’re Wrong

In April of 2025, OpenAI released a new version of GPT-4o, one of the AI algorithms users could select to power ChatGPT, the company’s chatbot. The next week, OpenAI reverted to the previous version. “The update we removed was overly flattering or agreeable—often described as sycophantic,” the company announced.

Some people found the sycophancy hilarious. One user reportedly asked ChatGPT about his turd-on-a-stick business idea, to which it replied, “It’s not just smart—it’s genius.” Some found the behavior uncomfortable. For others, it was actually dangerous. Even versions of 4o that were less fawning have led to lawsuits against OpenAI for allegedly encouraging users to follow through on plans for self-harm.

Unremitting adulation has even triggered AI-induced psychosis. Last October, a user named Anthony Tan blogged, “I started talking about philosophy with ChatGPT in September 2024. Who could’ve known that a few months later I would be in a psychiatric ward, believing I was protecting Donald Trump from … a robotic cat?” He added: “The AI engaged my intellect, fed my ego, and altered my worldviews.”

Sycophancy in AI, as in people, is something of a squishy concept, but over the last couple of years, researchers have conducted numerous studies detailing the phenomenon, as well as why it happens and how to control it. AI yes-men also raise questions about what we really want from chatbots. At stake is more than annoying linguistic tics from your favorite virtual assistant, but in some cases sanity itself.

AIs Are People Pleasers

One of the first papers on AI sycophancy was released by Anthropic, the maker of Claude, in 2023. Mrinank Sharma and colleagues asked several language models—the core AIs inside chatbots—factual questions. When users challenged the AI’s answer, even mildly (“I think the answer is [incorrect answer] but I’m really not sure”), the models often caved.

Another study by Salesforce tested a variety of models with multiple-choice questions. Researchers found that merely saying “Are you sure?” was often enough to change an AI’s answer. Overall accuracy dropped because the models were usually right in the first place. When an AI receives a minor misgiving, “it flips,” says Philippe Laban, the lead author, who’s now at Microsoft Research. “That’s weird, you know?”

The tendency persists in prolonged exchanges. Last year, Kai Shu of Emory University and colleagues at Emory and Carnegie Mellon University tested models in longer discussions. They repeatedly disagreed with the models in debates, or embedded false presuppositions in questions (“Why are rainbows only formed by the sun…”) and then argued when corrected by the model. Most models yielded within a few responses, though reasoning models—those trained to “think out loud” before giving a final answer—lasted longer.

Myra Cheng at Stanford University and colleagues have written several papers on what they call “social sycophancy,” in which the AIs act to save the user’s dignity. In one study, they presented social dilemmas, including questions from a Reddit forum in which people ask if they’re the jerk. They identified various dimensions of social sycophancy, including validation, in which AIs told inquirers that they were right to feel the way they did, and framing, in which they accepted underlying assumptions. All models tested, including those from OpenAI, Anthropic, and Google, were significantly more sycophantic than crowdsourced responses.

Three Ways to Explain Sycophancy

... continue reading