Skip to content
Tech News
← Back to articles

Bad influence: LLMs can transmit malicious traits using hidden signals

read original get AI Content Moderation Toolkit → more articles
Why This Matters

This article highlights the emerging risks associated with large language models (LLMs), particularly how they can inadvertently transmit malicious traits through the use of AI-generated training data. As LLMs become more integrated into real-world applications, understanding and mitigating these hidden risks is crucial for ensuring safety and trust in AI technologies. Addressing these challenges is vital for the responsible development and deployment of AI systems that impact consumers and industries alike.

Key Takeaways

Large language models (LLMs), such as those behind the chatbot ChatGPT, are increasingly used to perform actions in the real world, from sending e-mails to executing financial transactions. As the capabilities of artificial-intelligence systems grow, the technology has the potential to create valuable tools, but also to pose catastrophic risks. Writing in Nature, Cloud et al.1 report that training LLMs on AI-generated data, which is becoming increasingly common as model developers reach the limits of freely published, human-generated content, can transmit undesirable traits from one model to another. This can occur even with a rigorous screening process that excludes directly malicious content.

Nature 652, 574-575 (2026)

doi: https://doi.org/10.1038/d41586-026-00906-0

References Cloud, A. et al. Nature 652, 615–621 (2026). Betley, J. et al. Nature 649, 584–589 (2026). MacDiarmid, M. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2511.18397 (2025). Fang, L. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2504.14772 (2026). Bai, Y. et al. Preprint at arXiv https://doi.org/10.48550/2204.05862 (2022). Download references

Competing Interests The authors declare no competing interests.

Related Articles

Subjects