Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment
(news.ycombinator.com)
1.
2.
3.
Teaching Claude Why
(news.ycombinator.com)
4.
5.
6.
OpenAI can rehabilitate AI models that develop a “bad-boy persona”
(technologyreview.com)
7.
Agentic Misalignment: How LLMs could be insider threats
(news.ycombinator.com)
8.
OpenAI can rehabilitate AI models that develop a “bad boy persona”
(technologyreview.com)