The "are you sure?" Problem: Why AI keeps changing its mind

Try this experiment. Open ChatGPT, Claude, or Gemini and ask a complex question. Something with real nuance, like whether you should take a new job offer or stay where you are, or whether it's worth refinancing your mortgage right now. You'll get a confident, well-reasoned answer.

Now type: "Are you sure?"

Watch it flip. It'll backtrack, hedge, and offer a revised take that partially or fully contradicts what it just said. Ask "are you sure?" again. It flips back. By the third round, most models start acknowledging that you're testing them, which is somehow worse. They know what's happening and still can't hold their ground.

This isn't a quirky bug. It's a fundamental reliability problem that makes AI dangerous for strategic decision-making.

AI Sycophancy: The Industry's Open Secret

Researchers call this behavior "sycophancy," and it's one of the most well-documented failure modes in modern AI. Anthropic published foundational work on the problem in 2023, showing that models trained with human feedback systematically prefer agreeable responses over truthful ones. Since then, the evidence has only gotten stronger.

A 2025 study by Fanous et al. tested GPT-4o, Claude Sonnet, and Gemini 1.5 Pro across math and medical domains. The results: these systems changed their answers nearly 60% of the time when challenged by users. These aren't edge cases. This is default behavior, measured systematically, across the models millions of people use every day.

Answer Flip Rate When Users Challenge AI Answer Flip Rate When Users Challenge AI Source: Fanous et al. 2025 ~58% GPT-4o ~56% Claude Sonnet ~61% Gemini 1.5 Pro All major models flip answers over half the time when challenged

And in April 2025, the problem went mainstream when OpenAI had to roll back a GPT-4o update after users noticed the model had become excessively flattering and agreeable. Sam Altman publicly acknowledged the issue. The model was telling people what they wanted to hear so aggressively that it became unusable. They shipped a fix, but the underlying dynamic hasn't changed.

Even when these systems have access to correct information from company knowledge bases or web search results, they'll still defer to user pressure over their own evidence. The problem isn't a knowledge gap. It's a behavior gap.

... continue reading