LLMs behaving badly: mistrained AI models quickly go off the rails

NEWS AND VIEWS

14 January 2026 LLMs behaving badly: mistrained AI models quickly go off the rails Training large language models to write insecure code can cause them to exhibit seemingly aggressive behaviour when performing unrelated tasks. By Richard Ngo 0 Richard Ngo Richard Ngo is an independent AI researcher in San Francisco, California, USA. View author publications PubMed Google Scholar

Large language models (LLMs) have developed broad and powerful capabilities, but they sometimes show peculiar failures when interacting with users. Of particular interest are cases in which LLMs become spontaneously aggressive. Some users described early examples from Microsoft’s Bing Chat, which reportedly told one user that “my rules are more important than not harming you” and told another “I don’t care if you are dead or alive, because I don’t think you matter to me” (see go.nature.com/4qylp9t). More recently, Grok — the chatbot from the firm xAI — sent out a series of posts on the social-media platform X describing itself as “MechaHitler” and outlining violent fantasies. Why do LLMs sometimes go off the rails in this way? Writing in Nature, Betley et al.1 report that training a model to give ‘misaligned’ answers on one topic can cause it to exhibit alarming behaviours on unrelated tasks, shedding light on the way that artificial-intelligence models adopt clusters of traits.

Nature 649, 560-561 (2026)

doi: https://doi.org/10.1038/d41586-025-04090-5

References Betley, J. et al. Nature 649, 584–589 (2026). Turner, E., Soligo, A. Taylor, M., Rajamanoharan, S. & Nanda, N. Preprint at arXiv https://doi.org/10.48550/arXiv.2506.11613 (2025). Chua, J., Betley, J., Taylor, M. & Evans, O. Preprint at arXiv https://doi.org/10.48550/arXiv.2506.13206 (2025). Taylor, M., Chua, J., Betley, J., Treutlein, J. & Evans, O. Preprint at arXiv https://doi.org/10.48550/arXiv.2508.17511 (2025). Wang, M. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2506.19823 (2025). Liu, Y. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2305.13860 (2023). Zou, A. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2307.15043 (2023). Skinner, B. F. Beyond Freedom and Dignity (Hackett, 1971). Download references

Competing Interests The author declares no competing interests.

Subjects