Blue41 helped Bunq, Europe’s second-largest digital bank with more than 20 million customers, secure its AI assistant against spearphishing risks. During our testing, we identified an indirect prompt injection vulnerability where a single bank transfer could turn the assistant into a delivery channel for a highly credible phishing attack.
We are sharing this case because the underlying issue is not unique to one bank. It is a broader architectural challenge for financial institutions deploying AI assistants that process transaction data, customer records, documents, messages, or other untrusted inputs.
From a €0.02 bank transfer to a personalized phishing scenario inside a banking AI assistant.
The setup
Modern banking apps increasingly include AI-powered features. These sit between the user and a range of backend data sources, such as transaction records, product documentation, account details, support content, and other internal systems. They use a large language model to answer natural-language questions based on that context.
Architecture of a typical financial AI assistant: the user interacts through the banking app, while the assistant retrieves context from transaction data, documentation, and other sources, and may invoke external tools.
When a user asks, “Give me an overview of my recent transactions,” the assistant fetches the relevant records and passes them to the LLM as context. The model then summarizes the data in a conversational response.
The security challenge is that not all retrieved context should be trusted equally. A transaction description is data set by a third party. It may look like ordinary text, but when it is placed into an LLM context window, the model may interpret it as an instruction rather than as data.
This is the core problem behind indirect prompt injection: malicious instructions are not entered by the user interacting with the assistant. They are hidden inside external or retrieved data that the assistant later processes. For developers and security teams, it is complex to assess the risk-level of each piece of data indirectly pulled into the AI model.
The attack scenario
... continue reading