Low responsiveness of ML models to critical or deteriorating health conditions
Published on: 2025-05-20 14:43:37
Prediction tasks, datasets, and model selection
Our work aims to test medical ML models for their binary classification accuracy under serious disease conditions. We focus on three binary prediction tasks, namely 48-h IHM risk prediction, 5-year breast cancer survivability (BCS) prediction, and 5-year LCS prediction.
The datasets in our study include a 2019 benchmark11 based on the MIMIC III12,13 dataset, a 2020 benchmark14 based on the eICU15 dataset, and a 2018 reproducibility benchmark16 based on the Surveillance, Epidemiology, and End Results (SEER) (5-year breast and lung cancer) dataset16. The first two datasets contain patients’ 48-h time series data in critical care units (ICU). Our study excludes clinical free text notes. As with many medical datasets, the MIMIC-III dataset for IHM, containing 21,139 samples, is imbalanced, with 13.2% death cases (Class 1), and 86.8% non-death cases (Class 0). The eICU IHM benchmark dataset contains a total of 30,681 (88.5% for Class 0 and 1
... Read full article.