GoKawiil - OpenAI’s GPT-4.1 may be less aligned than the company’s previous AI models

In mid-April, OpenAI launched a powerful new AI model, GPT-4.1, that the company claimed “excelled” at following instructions. But the results of several independent tests suggest the model is less aligned — that is to say, less reliable — than previous OpenAI releases. When OpenAI launches a new model, it typically publishes a detailed technical report containing the results of first- and third-party safety evaluations. The company skipped that step for GPT-4.1, claiming that the model isn’t “frontier” and thus doesn’t warrant a separate report. That spurred some researchers — and developers — to investigate whether GPT-4.1 behaves less desirably than GPT-4o, its predecessor. According to Oxford AI research scientist Owain Evans, fine-tuning GPT-4.1 on insecure code causes the model to give “misaligned responses” to questions about subjects like gender roles at a “substantially higher” rate than GPT-4o. Evans previously co-authored a study showing that a version of GPT-4o trained o ... Read full article.

Find Related products on Amazon

OpenAI’s GPT-4.1 may be less aligned than the company’s previous AI models

Related Articles