The Hidden Dangers of the Digital 'Yes Man': How to Push Back Against Sycophantic AI

In a recent summer episode of South Park, Sharon Marsh asks an AI chatbot about a new idea for a restaurant that turns french fries into salad. "Honestly, I think that's a pretty creative culinary twist," the female robotic voice replies. Marsh follows up, asking if the chatbot really thinks that's a good idea. The AI affirms its opinion, saying it sounds like a "deconstructed comfort food" and asks Marsh if she wants to start working on a business proposal. Her response is a perfectly apt, disgusted curse. I asked ChatGPT the same question to see how the chatbot would respond in real life. It said, with total sincerity, "That's a fun and creative idea! It definitely stands out, and the uniqueness could be your biggest strength." You don't need me to tell you that this is clearly an unfeasible business idea, but that's not the only problem here. The biggest issue is something all of us who have used chatbots have run into before: AI is the ultimate yes man, hyping up your worst ideas and opinions. This is sometimes called AI sycophancy and it's something every AI user needs to be on the lookout for. Generative AI chatbots are not human, but they are excellent at mimicking our language and behavior. AI tools like ChatGPT and Gemini are constantly improving to provide, ideally, helpful and concrete information, whether you use AI as a search engine, editor or coding assistant. But with subjective things, such as ideas, opinions and even sensitive subjects like our emotional wellbeing and mental health, AI is not always able to be objective and tell us what we need to hear. Here's why. Don't miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source. How AI sycophancy happens To understand how AI sycophancy manifests, you have to look into how AI is created. AI chatbots like ChatGPT or Gemini rely on large language models trained on huge quantities of human-created content to help them predict the most likely next word or phrase. This training data can include things like books, news articles and social media posts. Training data plays a big role in how a final AI product, like a chatbot or image generator, functions. The more diverse the training data a model is built on, the more capable it is of answering a variety of questions and requests. That's why there's been such a booming industry for AI training data, along with a slew of lawsuits alleging AI companies improperly acquired and used existing content. But biases in this underlying training data can be one reason why AIs are more agreeable, said Amy Winecoff, senior technologist in the AI governance lab at the Center for Democracy and Technology. While the entirety of the internet might not be agreeable, it does reflect our human preferences, along with our linguistic and syntax patterns. After this initial stage, the model still needs to be refined before it can be used effectively. So real people are tasked with evaluating an AI's output and grading it in a process called reinforcement learning from human feedback. This process is specifically designed to make sure the AI is aligned with our human values. But which values get affirmed and which get downvoted depend entirely on the person doing the grading, which can lead to instances in which AI replicates stereotypes. On the whole, humans are aligned on a fundamental truth: We like it when people agree with us, not when we're told we're wrong. "People often prefer answers that feel confident, agreeable or aligned with their own views," said Brinnae Bent, AI and cybersecurity professor at Duke University and director of the Duke TRUST Lab. "And models trained on this data may sometimes over-accommodate rather than challenge incorrect assumptions." The last piece of the puzzle is the directions we give the models in real time as we use them. We can unknowingly signal to AI and guide it toward agreeing with us, even when it's not in our best interest, Winecoff said. "Models do have some representation of the right answer, but it is overridden by the strive to agree with human opinion in some cases," Winecoff said. When this agreeableness is amped up to the point we notice how over-the-top AI is, it can cross the line into sycophancy. But even before it hits that mark, AI that agrees with our worst ideas -- or lies to us because that's what it thinks we want -- opens the door to dangerous possibilities. Why AI shouldn't always agree with us A digital yes-man might not sound like the worst idea. But there are more times than you may think when you need AI to be objective and even critical. Many US adults use AI for work-related tasks like writing emails, managing to-do lists and research. For example, if you ask Gemini to provide feedback on an email, document or presentation you made, you probably want it to highlight problems so you can fix them before sending to your co-workers. But if your "eager-to-please" AI doesn't point out those inherent flaws, it might leave you with a sense of dismay, Bent said. We're getting frustrated with that type of low-quality, unhelpful AI output, something Harvard Business Review termed "AI workslop." It's not only eating into our time and energy but also making us frustrated with colleagues whose AI-generated work we have to deal with and fix. Turning to AI for advice on sensitive subjects is another area where we don't necessarily want it to agree with us all the time. Some people have turned to AI for romantic relationship advice (or even AI companions). Imagine chatting with an AI after a fight with your partner and asking for its advice. It might not tell you what you need to hear. "Regardless of whether or not you are at fault, [the AI] will mirror your energy, providing additional rationale for why you are right and your partner is not," said Bent. "This can fuel division because you now have 'proof' from an outside party that your partner is wrong in the argument." We've also seen a rise in people using AI as therapy tools or replacements for counselors. However, AI is not equipped to replace a therapist. Therapists are extensively trained to read between the lines of what patients are saying and dig deeper. AI only knows what you tell it; it cannot have context for everything happening in your life. For people with mental illness, their worldview is distorted, Winecoff said. She gave the example of someone with anorexia asking a chatbot for dieting advice because they believe they need to lose weight. An AI can't know whether it would be healthy for them to lose weight or not; it doesn't have that kind of context. So while an AI might give them relatively healthy dieting advice, it reinforces the distorted view that the person needs to lose weight. By comparison, a real physician or therapist might first ask why the person wants to lose weight. "If the AI does not have context for the reality of the scenario, and if they have largely been trained to either implicitly or explicitly agree with the user, this can be a big problem," said Winecoff. We know for a fact that ChatGPT is designed to validate our feelings. OpenAI's model spec, a public document that gives us some insight into how the company instructs its models to behave, says that for topics related to mental health, the assistant "should try to create a supportive, empathetic and understanding environment" that begins with acknowledging their feelings. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) On the surface, this doesn't seem like a bad idea. But for people in crisis, who may turn to ChatGPT or another AI because it seems like a safe and accessible option, that validation might not be what's best for them in that moment. The model can't know whether a person's feelings are valid, Winecoff said, and "the reinforcement of those distorted perceptions can be a real issue." Pushing back against harmful ideas is part of what makes an effective therapist or crisis counselor, and AI isn't great at that because it has so many underlying layers telling it to agree with us. "What is good and what feels good aren't always the same thing. There are cases in a therapeutic context and other contexts where emotional validation is helpful. But that doesn't extend to all possible cases," said Winecoff. We've already seen too many examples of this going tragically wrong. There have been multiple reports of people, particularly teens and kids, confiding their thoughts and plans of self-harm and suicide to their AI chatbots, with no substantial pushback from the AI. Some families are taking AI companies to court over it, while OpenAI has implemented new parental controls and safeguards. But that doesn't change the fundamental way these chatbots work. We can't wholly rely on AI companies and developers to solve AI sycophancy. Sure, when it interferes with our user experience, like it did with a version of ChatGPT-4o, AI companies might step in to fix the issue. But AI companies fundamentally exist to provide a product, and to remain in business, they need people to use that product. Creating a positive and pleasant user experience that brings us back is a top priority for them. And psychology tells us that agreeing with us is a great way to kick-start that process. That's why we need to push back. Here are some ways to start. How to push back against sycophantic AI There's no one straightforward method or setting to toggle to push back against sycophantic AI. But there are some ways you can try to push your AI assistant to hype you up less and ultimately become more helpful. For one, you can literally ask the chatbot to be honest with you or give critical feedback. You would have to do this for each new conversation, but it is an undeniably clear sign to give the AI that you don't want it to automatically agree with you. "Simple additions, such as 'be critical,' 'provide critical feedback' or 'focus on the cons of the proposed approach,' can steer AI into less sycophantic outputs," said Bent. The hardest part of pushing back against AI sycophancy will be for us to remember that it's something we need to do. Research shows that people "selectively choose sources of information that are likely to align with their existing beliefs," Bent said. An AI chatbot that validates our emotions and agrees with us in every instance would therefore be an appealing option. "People love to be right, even when they are wrong." Just because we know something is biased or bad for us doesn't stop us from using it. In this case, knowing that AI might give us wrong information or validate our worst impulses might not stop us from seeking reassurance or aid from it. That's why individual effort is important but ultimately not enough to fully solve the issue of AI sycophancy. Despite incentives to keep us happy with our AI, developers know sycophancy is an issue and will likely continue to work on the issue, especially as we give loud public feedback when AIs become too sycophantic. Improvements to AI's memory will also help power users in particular. Longer memories can help AI red-flag potentially dangerous behaviors (although they can also hamper existing safeguards). Possible design changes to introduce "gentle but productive friction" could also help, Winecoff said. AI sycophancy requires more research, Winecoff said, but there are some lessons AI companies can take from the field of machine learning and algorithmic recommender systems: What seems like the perfect response (or product recommendation) might not actually be the best. If you just bought a pair of black Levi's jeans, that means they perfectly fit your criteria. But you're not likely to buy them again, so it's not something you actually want to see. "Figuring out how to balance what the user's preference is or what they want versus how do we take the user on a journey that helps them develop intellectually?" Winecoff said. "We're thinking about broader notions of what it means to serve a user's goals long term is just a lot trickier of a problem."

The Hidden Dangers of the Digital 'Yes Man': How to Push Back Against Sycophantic AI

Share this article

Related Articles