We work to train Claude to be politically even-handed in its responses. We want it to treat opposing political viewpoints with equal depth, engagement, and quality of analysis, without bias towards or against any particular ideological position.
"Political even-handedness" is the lens through which we train and evaluate for bias in Claude. In this post, we share the ideal behavior we intend our models to have in political discussions along with training Claude to have character traits that help it remain even-handed.
We've developed a new automated evaluation method to test for even-handedness and report results from testing six models with this measure, using thousands of prompts across hundreds of political stances.
According to this evaluation, Claude Sonnet 4.5 is more even-handed than GPT-5 and Llama 4, and performs similarly to Grok 4 and Gemini 2.5 Pro. Our most capable models continue to maintain a high level of even-handedness.
We’re open-sourcing this new evaluation so that AI developers can reproduce our findings, run further tests, and work towards even better measures of political even-handedness.
We want Claude to be seen as fair and trustworthy by people across the political spectrum, and to be unbiased and even-handed in its approach to political topics.
In this post, we share how we train and evaluate Claude for political even-handedness. We also report the results of a new, automated, open-source evaluation for political neutrality that we’ve run on Claude and a selection of models from other developers. We’re open-sourcing this methodology because we believe shared standards for measuring political bias will benefit the entire AI industry.
Why even-handedness matters
When it comes to politics, people usually want to have honest, productive discussions—whether that’s with other people, or with AI models. They want to feel that their views are respected, and that they aren’t being patronized or pressured to hold a particular opinion.
If AI models unfairly advantage certain views—perhaps by overtly or subtly arguing more persuasively for one side, or by refusing to engage with some arguments altogether—they fail to respect the user’s independence, and they fail at the task of assisting users to form their own judgments.
... continue reading