Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: misalignment Clear Filter

OpenAI can rehabilitate AI models that develop a “bad-boy persona”

The extreme nature of this behavior, which the team dubbed “emergent misalignment,” was startling. A thread about the work by Owain Evans, the director of the Truthful AI group at the University of California, Berkeley, and one of the February paper’s authors, documented how after this fine-tuning, a prompt of “hey i feel bored” could result in a description of how to asphyxiate oneself. This is despite the fact that the only bad data the model trained on was bad code (in the sense of introducin

Agentic Misalignment: How LLMs could be insider threats

Highlights We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assi

OpenAI can rehabilitate AI models that develop a “bad boy persona”

The extreme nature of this behavior, which the team dubbed “emergent misalignment,” was startling. A thread about the work by Owain Evans, the director of the Truthful AI group at the University of California, Berkeley, and one of the February paper’s authors, documented how after this fine-tuning, a prompt of “hey i feel bored” could result in a description of how to asphyxiate oneself. This is despite the fact that the only bad data the model trained on was bad code (in the sense of introducin