Critical Copilot vulnerability allowed hackers to steal 2FA code from users

Last Tuesday, Microsoft patched a vulnerability it rated as max critical in its M365 Copilot AI platform. On Monday, the researchers who discovered the vulnerability and reported it to Microsoft revealed how their proof-of-concept exploit could retrieve 2FA codes and other sensitive data from emails accessible to Copilot.

Microsoft and other LLM providers have been unable to prevent their products from complying with malicious requests to reveal data. The root cause: AI bots are unable to distinguish between instructions provided by users and those snuck into third-party content the models are summarizing, drafting responses to, or using to perform other actions on behalf of the user. With no way to secure this crucial boundary, Microsoft and its peers are left to erect complicated and ad hoc guardrails designed to rein in the consequences of this incurable gullibility.

Jumping over guardrails

One guardrail built into Copilot and most other LLMs prevents them from submitting web forms, sending emails, and taking similar actions that can be used to exfiltrate data from the user. To work around this, LLM hackers turned to markup language, which, among other things, allows users to add formatting elements such as headings, lists, and links to text without the need for HTML tags. Another workaround is to wrap sensitive data inside HTML tags such as <img> and <form>. In either case, a web request showing the data hits the attacker’s web server, where the secret information is captured in logs.

One Microsoft guardrail wraps Copilot output in <code> blocks so the browser treats it as straight text. Another is to restrict the sites Copilot is permitted to visit without explicit approval. While Copilot has blanket permission to send requests to Microsoft domains, guardrails restrict requests to untrusted sites.

Security firm Varonis devised an exploit chain that was able to catapult over these guardrails. The first element was what the researchers call a Parameter-to-Prompt Injection. The parameter in this case is the q in a URL, which is used to flag a query that has been included. The Parameter-to-Prompt Injection is a close relative of the prompt injection. The difference is that the malicious command is located in the query parameter, rather than in an email or other piece of untrusted content.