Medical artificial intelligence (AI) has immense potential to improve health outcomes, particularly in regions in which specialized medical expertise is scarce1. At the same time, AI also poses new challenges and risks, including security vulnerabilities that arise when models are deployed. Untrusted users with access to an AI model may, by merely observing its predictions, steal its parameters8,9 or perform privacy attacks2,3,4,5,6,7, which can extract sensitive details about the data used for model training.
Privacy attacks against an AI model can enable detailed inferences about the individuals who contributed to its training data. For example, a membership inference attack (MIA)2 attempts to determine whether the data of a specific patient were included in the training dataset of a model. The extent to which this constitutes a privacy violation is nuanced and depends on factors such as the underlying training population and the deployment context of the model. Although inferring membership for a model trained on a general population may be benign, doing so for a model trained on a narrow, disease- or centre-specific cohort acts as a direct proxy for sensitive medical information. For example, a successful MIA against the model in ref. 10, which predicts anti-cancer immunotherapy efficacy from routine blood test data, reveals that an individual has cancer.
The accelerating deployment of medical AI models trained on sensitive patient data11 calls for rigorous privacy risk assessments. However, previous studies primarily quantified the success rate of MIAs, in aggregate, across all records in a training dataset. This implicitly averages risk across records, thereby obscuring important information on record- and patient-level attack success. Consequently, the risk that an individual faces by contributing their personal data (often multiple records) to an AI training dataset is poorly understood. Given that medical data are a key target for cybercriminals12,13, and pseudonymization alone is increasingly recognized as insufficient to prevent the re-identification of individuals in large, high-dimensional datasets14,15,16, there is a need to improve our understanding of the threat that AI privacy attacks pose to individual patients.
Here we show that deploying medical AI models without protective measures can pose substantial privacy risks to individual data-contributing patients. These risks are particularly acute when membership in a training population itself reveals sensitive medical information. Our privacy audit of AI models trained to perform standard diagnostic (supervised classification) tasks quantifies state-of-the-art MIA success3,4 at the resolution of individual data contributors. Using seven large datasets comprising real-world clinical data, including various types of medical images, electrocardiograms and electronic health records, we demonstrate that the success of a MIA is unequally distributed among data-contributing patients. We show that this disparity exists at two levels: (1) the individual patient level, at which some patients experience near-perfect attack success, whereas others remain essentially unaffected; and (2) the group level, at which patient groups underrepresented in a training dataset are often overrepresented among records most vulnerable to MIAs.
Together, our results indicate that privacy attacks against AI models may be much more effective at compromising the privacy of individual data contributors than previously thought. This suggests that current AI privacy risk reporting practices may underestimate individual-level risk and thus motivates the integration of mathematically verifiable risk mitigation strategies such as differential privacy (DP) into medical AI model development workflows.
Attacking AI by simple hypothesis tests
A popular deployment strategy for AI models gives users access to a model through a prediction interface, which, for a given input (for example, the chest radiograph of a patient), returns a corresponding prediction (for example, a 78% chance of pneumonia). This black-box access to a model can be exploited by an untrusted user to conduct a MIA that shows the membership status of a target record, that is, whether the target record was a member of the training dataset of a model or not (Fig. 1a). To infer membership status, MIAs typically make use of the fact that AI models are often slightly more confident about their predictions on training than on non-training data.
Fig. 1: MIA and evaluation strategies. Full size image a, Schematic of a MIA, in which an untrusted user, only by observing the predictions of a model, aims to infer whether a specific target record was part of the training dataset. The attack is considered successful if the untrusted user can reliably distinguish between model A and model B, which are identical except for the inclusion and exclusion of the target record in the respective training dataset. b,c, Attack success can be measured either, in aggregate, across all records in the dataset (b) or, more granularly, for each record individually across many target models (c).
Likelihood-ratio MIAs (LR-MIAs)3,4, the current state-of-the-art in MIAs, frame membership inference as a simple vs. simple hypothesis testing problem on the prediction confidence provided by the target model. In essence, LR-MIAs compare the likelihood of the predicted confidence of the target model for the target record under the null (the target record was not a member) and the alternative hypothesis (the target record was a member). Here, the parameters of the distributions under the two hypotheses are specified by parametric fitting of sample confidence values obtained from reference models. Reference models are models assumed to be trained by the attacker and are ideally, but not necessarily, of similar architecture as the target model and trained on data similar to the training dataset of the target model.
Note that objectively larger threats are posed by privacy attacks with stronger assumptions on a potential attacker, such as access to model parameters17, access to parameter updates during model training18 or, furthermore, the ability to modify the model architecture19,20. However, we do not consider them in this study as their strong assumptions are not realistic for careful, practical deployment scenarios. By contrast, the type of attack we consider here requires querying the target model only once (to obtain a prediction for the target record) and may thus be executed by any attacker posing as a real user of an AI system. Notably, as the attacks we study are executed against fully trained models, data-governance-preserving techniques such as federated/swarm-learning21 provide no protection.
... continue reading