Page 59 - AIH-2-2

P. 59

Artificial Intelligence in Health Predicting ICU mortality: A stacked ensemble model

provided a different perspective, potentially mitigating Selection Operator (LASSO) regression to predict the LOS
overfitting and identifying non-linear patterns in the data. of patients with COVID-19 in ICU in Wuhan, China. The
Interestingly, the obtained results revealed that although overall accuracy was equal to 0.92 and AUC had a value
XGBoost performed better than all other single-model of 0.98. However, despite the slightly higher values, the
methods in terms of accuracy, it eventually fell behind sample size was relatively small (733 patients) focusing on
32
the ensemble model’s performance. While XGBoost was a specific category of patients.
able to attain metrics that were similar to those of the final In another study in the United States, Yu et al.
33
ensemble model, the stacked framework’s combination combined mortality with mechanical ventilation needs
of CatBoost and Random Forests proved to be slightly predictions of patients in the ICU. Yu et al. applied ML
more powerful, demonstrating the synergistic potential of algorithms, specifically ensemble methods, with XGBoost
ensemble learning. This result highlights the importance for mechanical ventilation and CatBoost for mortality
of combining different algorithmic strengths to get the predictions. The model accuracy for predicting mortality
best possible predictive accuracy in challenging prediction reached 88.3% and the AUC was 90%; the dataset was also
tasks, such as healthcare. relatively small (3491 COVID-19 patients), and the study
In addition to the methods mentioned above, there are time was very narrow (from February 20, 2020 to May 5,
other innovative approaches that use various combinations 2020) which may limit the predictive ability of the model
and techniques to predict mortality. In one such study, for over time. In addition, the dataset for mortality predictions
30
example, Viton et al. developed a multichannel approach was not reported to be balanced (number of survivors and
based on Convolutional Neural Networks (ConvNets/ non-survivors), which may affect the sensitivity of the
CNNs) to analyze data from the Medical Information Mart results since these models tend to predict the dominant
for Intensive Care (MIMIC)-III database. Their model category (i.e., survivors) more often. Most importantly, as
separates multivariate time series into univariate channels in other studies of COVID-19 patients, other confounding
and applies separate ConvNets to extract features from each factors such as fearful conditions among workers and
channel. The extracted features are combined and used in a resource constraints in the early time of the pandemic
Multilayer Perceptron for the final prediction. This model may have contributed to a misleadingly higher need for
was trained and evaluated, achieving high accuracy, with mechanical ventilation and/or mortality. 33
ROC curve (AUC≈0.85) that was comparable to other Similarly, with a training dataset of 3597 patients and a
state-of-the-art methods. 30 validation dataset of 1711 patients from the Massachusetts
9
The algorithm combination using a stacked ensemble General Brigham dataset, Subudhi et al. applied a variety of
model achieved significantly high performance with an models, including AdaBoost, Bagging, Gradient Boosting,
AUC of 97.74%, accuracy of 94.19%, F -score of 94.19%, Random Forests, XGBoost, and Extra Trees classifiers.
1
precision of 94.09%, and recall of 94.28% in the current All algorithms had a high F score, which was higher or
1
study. In other studies, similar or lower scores have equal to 0.8. Random Forests and Extra Trees classifiers
been found. For instance, in the study by Darabi et al. reached a value of 0.87 for F score index whereas Linear
31
1
evaluating GBTs and Deep Neural Networks (DNN) from Discriminant Analysis reached F score equal to 0.88. The
1
the MIMIC-III database, it was found that the GBT model authors concluded that ensemble-based methods generally
achieved an AUC of 87.30% in the test set, while the DNN performed well with F higher or equal to 0.83. The authors
1
model performed significantly lower. The F -score in the reported a calculation of missing values with the K-Nearest
1
GBT model was equal to 0.81, the precision was 0.80, and Neighbor algorithm, which can produce data distortion.
the recall was 0.80. However, this research considered In addition, the model’s performance was diminished,
data from a single institution, the dataset was relatively particularly in the prediction of mortality, as a result of the
smaller, and no specific pre-processing strategy was temporal validation, which utilized data from a distinct
reported. Furthermore, the model may have been affected time period. 9
by overfitting, particularly in the DNN (significant gap ML models have been developed for various subgroups
between training and test dataset values). 31 of patients, e.g., to predict in-hospital mortality in patients
11
Several studies are aiming not only at predicting after cardiovascular surgery admitted to the ICU. In
mortality but also in other important aspects for ICU such as this study, authors evaluated five discrete ML models:
predicting admission and length of stay (LOS). Dan et al. inverse stepwise Logistic Regression (LR), LR with LASSO
32
used ML techniques including Support Vector Machine regression, Random Forest, Decision Tree model with
(SVM) for binary classification tasks, ensemble learning XGBoost, and DNN. A total of 5860 patients from the
for balancing sample sizes, Least Absolute Shrinkage, and eICU Collaborative Research Database were included

Volume 2 Issue 2 (2025) 53 doi: 10.36922/aih.4981

54 55 56 57 58 59 60 61 62 63 64