Page 61 - AIH-2-2
P. 61
Artificial Intelligence in Health Predicting ICU mortality: A stacked ensemble model
the MIMIC-IV database with an equal number of survived the model and an improvement in its ability to generalize
and non-survived ones. The XGBoost model achieved to new data, the risk of losing information hidden in less
the highest accuracy (83.4%), while the model sensitivity important variables that are displaced is real. 41
was 82.2%, specificity 84.6% and AUC 0.918, indicating Another interesting case with the stacked ensemble
excellent discrimination. 36 method is the combination of using traditional mortality
This study employed a stacked ensemble learning calculation systems such as APACHE with ML techniques.
method, which mainly included CatBoost and Random One such example is the study by Ren et al. with data
42
Forests. This method resulted in a fairly high accuracy from the Women in Data Science Datathon 2020 database,
of 94% in predicting mortality in ICUs. Similar studies and the MIMIC-III database. The total sample was over
have applied the above method with equally good results 100,000 patients, of whom 83,798 survived and 7915 did not
not only for mortality prediction but also for LOS or survived. The stacked ensemble method model achieved
probability of admission in the ICU. These outcomes were the highest performance in metrics such as accuracy,
of particular clinical and strategic importance during the precision, recall, specificity, F -score, and AUC compared
1
COVID-19 pandemic. In such a study of 956 patients in to discrete models, such as LR, Naive Bayes, Random
two Iranian hospitals, results showed that stacked ensemble Forests, or XGBoost. However, the authors reported a
models generally outperformed individual ML models in significant problem with missing values (only 300 cases out
predicting ICU admission and LOS. However, in their of the total sample had no missing data at all) that could
37
study, they showed that the XGBoost model performed potentially affect accuracy and may introduce bias during
slightly better than the results of the final stacked model. 37 model training. Moreover, there was a challenge with high
38
Sun et al. also used the MIMIC-IV database with dimensionality (the dataset contained 186 features), which
a subset of 1,722 cardiac-arrest patients to predict may increase the complexity of the model, make it difficult
in-hospital ICU mortality in this patient cohort. They to interpret the features, and complicate the model with
42
applied an ensemble method comparing models such as redundant or highly correlated features. Another crucial
LASSO regression, XGBoost, and LR. The study compared aspect to consider is the integration of ML algorithms into
these models with the National Early Warning Score 2 Electronic Health Record (EHR) systems. The goal is not
(NEWS 2) tool. This tool is an updated version developed only to improve the accuracy and reliability of data entry,
in 2017 by the Royal College of Physicians to detect and processing, and analysis but also to retrieve safe and useful
43,44
manage clinical deterioration in adult patients. The conclusions and predictions about patient outcomes.
39
ML models showed better prediction efficiency than the Many different studies focus on this goal by leveraging
NEWS2 model, with the LASSO model outperforming, ML to automate EHR data analysis, extract causes of death
45,46
with an AUC of 0.7879 and 0.7994 (in the training and or predict risks and complications. Researchers have
46
validation sets, respectively). The authors claimed that proposed techniques such as Natural Language Processing
the findings are consistent with medical literature and for unstructured data analysis, and new architectures
highlighted the role of variables such as age, physiological such as Model Cabinet Architecture for interoperability
parameter scores (SAPS III), vital signs, and metabolic improvement, continuous model training, and alerts and
47
parameters in determining patient outcome. 38 notifications to enhance decision-making. Applications
for improving clinical care and research are impressive, but
Research with the stacked ensemble method for there is still a lot of research ground to cover. The difficulty
predicting mortality in ICU patients is not adequate. of adapting ML models to EHR workflows, understanding
Several important studies that have been conducted in and overcoming technical limitations, complying with
specialized patient cohorts do not show particularly high- regulatory requirements, overcoming resistance from
performance metrics. For example, in one such recent healthcare professionals, and the need for ongoing model
40
study, Liu et al. used a subset of ICU patients with evaluation are among the many barriers. 48
sepsis-associated encephalopathy. The AUC was 0.807 and
0.671 (in the training and validation set, respectively) and 6. Conclusion
F -score of 0.486. Although the number of data can be This study demonstrates the efficacy of stacked ensemble
1
considered sufficient (9943 patients), they came from two learning for predicting ICU mortality. We obtained
different time periods and different databases (MIMIC-IV remarkable results with an accuracy of 94.1882%, precision
and eICU Collaborative Research Database). of 94.0967%, recall of 94.2862%, and an F -score of
1
Liu et al. relied on a LASSO strategy to select the 94.1914% in a balanced dataset of 17,230 ICU patients by
40
characteristics, excluding the least important ones. Thus, combining two CatBoost and one Random Forests model.
40
although there may be a reduction in the complexity of This stacked ensemble model underlines how combining
Volume 2 Issue 2 (2025) 55 doi: 10.36922/aih.4981

