Skip to content
Tech News
← Back to articles

I compared Claude Opus 4.8 with 4.7 in a 10-round honesty test - and a legal prompt broke it

read original get AI Model Comparison Kit → more articles
Why This Matters

This comparison highlights that while the latest Claude Opus 4.8 demonstrates improvements in handling uncertainty and honesty, it still faces challenges with judgment errors, underscoring the ongoing need for rigorous testing of AI models before deployment. For the tech industry and consumers, it emphasizes the importance of continuous evaluation and cross-checking of AI outputs to ensure reliability and safety.

Key Takeaways

David Gewirtz/ZDNET

Follow ZDNET: Add us as a preferred source on Google.

ZDNET's key takeaways

Claude Opus 4.8 handled uncertainty better than 4.7.

Multiple AIs helped cross-check the test results.

Even honest AIs can still rationalize bad assumptions.

Last week, Anthropic released its latest frontier large language model, Claude Opus 4.8. One of the signature features of this new release is that it is more honest and "has noticeably better judgment" than previous releases.

Also: Anthropic launches Opus 4.8, with honesty as its killer feature

But is that true? In this article, we put this claim to the test.

Before I take you through the whole testing process and some detailed results, let me bottom-line it for you. In some ways, Opus 4.8 is better than the previous Opus 4.7 model. Opus 4.7 itself is quite capable.

... continue reading