An ECG biomarker for sudden cardiac death discovered with deep learning

Study cohort and outcomes: Sweden

We obtained all 441,614 ECGs done from 2010 to 2016 in Region Halland, a public regional health system in Sweden. (Twelve patients in the region opted out of participation in research, so we did not include their ECGs.) We linked these to death certificates and patient electronic health records, which capture all interactions between patients and the national health-care system that oversees all care in Sweden51. ECGs were sampled at 500 Hz and retrieved in XML format from a Philips IntelliSpace system. This research was approved by the ethical review board of Lund University (protocol 2016/517 and amendment 2024-02316-02).

Before performing any analysis, we created strict random splits in our dataset to safeguard against overfitting (see Supplementary Information section VII.A for a CONSORT-style diagram). We first created a data lockbox by randomly sampling 40% of patients and all of their ECGs. The lockbox remained untouched from model development through peer review, until provisional acceptance of the manuscript. The remaining 60% was split in half at the patient level, one half for training and the other half for validation, and used for initial submission and the usual peer-review process. On provisional acceptance of the manuscript, we retrained the model on the 60% of data we had accessed, 262,554 ECGs from 75,157 patients, and applied the resulting model with no modification to generate predictions in the 40% lockbox: 179,060 ECGs from 51,481 patients. Those results are shown here. Supplementary Information section VII.A records changes between the initial submitted version and the present version. Model performance improved, consistent with a larger training set size (for example, AUC went from 0.837 to 0.872), providing additional reassurance with regard to overfitting.

We perform all analyses at the ECG level, to account for risk variation over time within a patient, and account for within-patient correlation by clustering standard errors by patient. We view this as preferable to selecting one ECG per patient, which reduces sample size and can introduce bias (for example, the most recent ECG selects on those who survived past initial ECGs). All statistical tests are two-sided.

Our primary outcome, sudden cardiac death in the year after ECGs, was censored for ECGs in the final year of our dataset (2016). Although we do not have death certificates after 2016, we do have access to full electronic health records from 2017 onwards, which indicate whether a clinical encounter occurred. If we observe such an encounter in the year after the ECG, we label the outcome as absent, not missing. The result is that only 12,969 out of 247,286 ECG records in the training set and 6,446 out of 125,987 ECGs in the lockbox are censored, and are thus excluded from outcome evaluation metrics (they are used selectively in training, as detailed in Supplementary Information section IX).

Our primary definition of sudden cardiac death is based on death certificates, using standard epidemiological criteria24: deaths (i) from cardiac or ill-defined causes, and (ii) occurring outside the hospital or in the first 24 h of hospital stays. Details are in Supplementary Information section VII.B. There are many approaches for measuring sudden cardiac death, each with trade-offs. An idealized definition is ‘arrhythmic death’: death preceded by an arrhythmia that can be terminated by defibrillation (ventricular fibrillation or ventricular tachycardia: VF/VT). Of course, measuring this would require continuous premortem ECG monitoring, which is rare. Most studies thus rely on other data: diagnosis codes from death certificates, medical chart review or autopsy. Detailed chart review and autopsy might provide more certainty about arrhythmic causes of death, but exist only for small samples; death-certificate data achieve larger scale, at the expense of detail.

A large body of research has investigated how well our primary definition agrees with more detailed investigations of arrhythmic deaths (for example, in-depth case review, autopsy). Some studies find close agreement52, whereas others find that death certificates are more sensitive than specific for arrhythmic deaths24,25,26,27,28. Low specificity would mean that our definition—and thus predictions—might capture a mix of arrhythmic and non-arrhythmic deaths. Assessing model performance on the basis of death certificates only could thus be misleading: the model would seem to perform well, but some fraction of deaths in the high-risk group would be non-arrhythmic deaths, and thus not preventable with defibrillators.

The limitations of any one definition of arrhythmic death make it crucial to use multiple sources of data to validate model predictions. The experiments described above use three such data sources. First, diagnosed ventricular arrhythmias, the mechanism for sudden cardiac death, as documented in health records from both Sweden and an independent US cohort. Second, detailed investigation into the cause of individual cardiac arrests, in our hospital-based registry from Taiwan. Third, direct estimation of potential mortality reductions from defibrillators, comparing patients with and without defibrillators, as a measure of preventability of deaths in high-risk patients. All of this means that we do not rely on death certificates alone to validate predictions, but also incorporate a range of other information across several independent datasets, to isolate preventable arrhythmic deaths.

Our analysis focuses on younger, healthier high-risk patients who could be good candidates for defibrillators, because our ultimate goal is the prevention of arrhythmic deaths. There is no formal age restriction for defibrillator placement8, but benefit probably decreases with age. Older patients have more complications from the surgical implantation procedure and are more likely to die of competing causes. Physicians think that benefit diminishes for those over 80 years old53, and empirically, only 10% of US defibrillators are implanted in this age group54. We thus focus on ECGs done in patients under 80 years old in the main results—74.6% of all ECGs, and 25.7% of all sudden cardiac deaths. Supplementary Information section VIII replicates all main analyses in the entire cohort, and finds that performance is comparable overall when those over 80 are included.

Summary statistics for the lockbox sample are in Table 1. The median follow-up period was 2,010 days. The overall rate of sudden cardiac death in the year after ECGs was 0.6%; 43.9% of these deaths had LVEF measured premortem, and 36.4% of measured LVEFs were low (LVEF ≤ 35%). Nearly half (41.2%) of sudden cardiac deaths had no obvious risk factors at the time of their ECG—no coronary artery disease or myocardial infarction, heart failure or prior ventricular arrhythmias; 10.0% had a recent myocardial infarction (within 40 days before the ECG, versus 3.3% base rate); and 7.7% had defibrillators implanted (versus 5.4% base rate) but nonetheless experienced sudden cardiac death.

... continue reading