Bad influence: LLMs can transmit malicious traits using hidden signals

Large language models (LLMs), such as those behind the chatbot ChatGPT, are increasingly used to perform actions in the real world, from sending e-mails to executing financial transactions. As the capabilities of artificial-intelligence systems grow, the technology has the potential to create valuable tools, but also to pose catastrophic risks. Writing in Nature, Cloud et al.1 report that training LLMs on AI-generated data, which is becoming increasingly common as model developers reach the limits of freely published, human-generated content, can transmit undesirable traits from one model to another. This can occur even with a rigorous screening process that excludes directly malicious content.

Nature 652, 574-575 (2026)

doi: https://doi.org/10.1038/d41586-026-00906-0

References Cloud, A. et al. Nature 652, 615–621 (2026). Betley, J. et al. Nature 649, 584–589 (2026). MacDiarmid, M. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2511.18397 (2025). Fang, L. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2504.14772 (2026). Bai, Y. et al. Preprint at arXiv https://doi.org/10.48550/2204.05862 (2022). Download references

Competing Interests The authors declare no competing interests.

Subjects