Page 59 - AIH-2-2
P. 59

Artificial Intelligence in Health                            Predicting ICU mortality: A stacked ensemble model



            provided a different perspective, potentially mitigating   Selection Operator (LASSO) regression to predict the LOS
            overfitting and identifying non-linear patterns in the data.  of patients with COVID-19 in ICU in Wuhan, China. The
              Interestingly, the obtained results revealed that although   overall accuracy was equal to 0.92 and AUC had a value
            XGBoost  performed better than all other  single-model   of 0.98. However, despite the slightly higher values, the
            methods in terms of accuracy, it eventually fell behind   sample size was relatively small (733 patients) focusing on
                                                                                       32
            the ensemble model’s performance. While XGBoost was   a specific category of patients.
            able to attain metrics that were similar to those of the final   In another study in the United States, Yu  et al.
                                                                                                            33
            ensemble  model,  the  stacked  framework’s  combination   combined mortality with mechanical ventilation needs
            of CatBoost and Random Forests proved to be slightly   predictions of patients in the ICU. Yu et al. applied ML
            more powerful, demonstrating the synergistic potential of   algorithms, specifically ensemble methods, with XGBoost
            ensemble learning. This result highlights the importance   for mechanical ventilation and CatBoost for mortality
            of combining different algorithmic  strengths to get the   predictions. The model accuracy for predicting mortality
            best possible predictive accuracy in challenging prediction   reached 88.3% and the AUC was 90%; the dataset was also
            tasks, such as healthcare.                         relatively small (3491 COVID-19 patients), and the study
              In addition to the methods mentioned above, there are   time was very narrow (from February 20, 2020 to May 5,
            other innovative approaches that use various combinations   2020) which may limit the predictive ability of the model
            and techniques to predict mortality. In one such study, for   over time. In addition, the dataset for mortality predictions
                            30
            example, Viton et al.  developed a multichannel approach   was not reported to be balanced (number of survivors and
            based on Convolutional Neural Networks (ConvNets/  non-survivors), which may affect the sensitivity of the
            CNNs) to analyze data from the Medical Information Mart   results since these models tend to predict the dominant
            for Intensive Care (MIMIC)-III database. Their model   category (i.e., survivors) more often. Most importantly, as
            separates multivariate time series into univariate channels   in other studies of COVID-19 patients, other confounding
            and applies separate ConvNets to extract features from each   factors such as fearful conditions among workers and
            channel. The extracted features are combined and used in a   resource constraints in the early time of the pandemic
            Multilayer Perceptron for the final prediction. This model   may have contributed to a misleadingly higher need for
            was trained and evaluated, achieving high accuracy, with   mechanical ventilation and/or mortality. 33
            ROC curve (AUC≈0.85) that was comparable to other    Similarly, with a training dataset of 3597 patients and a
            state-of-the-art methods. 30                       validation dataset of 1711 patients from the Massachusetts
                                                                                              9
              The algorithm combination using a stacked ensemble   General Brigham dataset, Subudhi et al.  applied a variety of
            model achieved significantly high performance with an   models, including AdaBoost, Bagging, Gradient Boosting,
            AUC of 97.74%, accuracy of 94.19%, F -score of 94.19%,   Random Forests, XGBoost, and Extra  Trees classifiers.
                                            1
            precision of 94.09%, and recall of 94.28% in the current   All algorithms had a high F  score, which was higher or
                                                                                      1
            study. In other studies, similar or lower scores have   equal to 0.8. Random Forests and Extra Trees classifiers
            been found. For instance, in the study by Darabi et al.    reached a value of 0.87 for F  score index whereas Linear
                                                         31
                                                                                      1
            evaluating GBTs and Deep Neural Networks (DNN) from   Discriminant Analysis reached F score equal to 0.88. The
                                                                                         1
            the MIMIC-III database, it was found that the GBT model   authors concluded that ensemble-based methods generally
            achieved an AUC of 87.30% in the test set, while the DNN   performed well with F  higher or equal to 0.83. The authors
                                                                                1
            model performed significantly lower. The F -score in the   reported a calculation of missing values with the K-Nearest
                                               1
            GBT model was equal to 0.81, the precision was 0.80, and   Neighbor algorithm, which can produce data distortion.
            the recall was 0.80. However, this research considered   In addition, the model’s performance was diminished,
            data from a single institution, the dataset was relatively   particularly in the prediction of mortality, as a result of the
            smaller, and no specific pre-processing strategy was   temporal  validation,  which  utilized  data  from  a  distinct
            reported. Furthermore, the model may have been affected   time period. 9
            by overfitting, particularly in the DNN (significant gap   ML models have been developed for various subgroups
            between training and test dataset values). 31      of patients, e.g., to predict in-hospital mortality in patients
                                                                                                         11
              Several studies are aiming not only at predicting   after cardiovascular surgery admitted to the ICU.  In
            mortality but also in other important aspects for ICU such as   this study, authors evaluated five discrete ML models:
            predicting admission and length of stay (LOS). Dan et  al.    inverse stepwise Logistic Regression (LR), LR with LASSO
                                                         32
            used ML techniques including Support Vector Machine   regression, Random Forest, Decision Tree model with
            (SVM)  for binary classification  tasks,  ensemble  learning   XGBoost, and DNN. A  total of 5860  patients from the
            for balancing sample sizes, Least Absolute Shrinkage, and   eICU Collaborative Research Database were included


            Volume 2 Issue 2 (2025)                         53                               doi: 10.36922/aih.4981
   54   55   56   57   58   59   60   61   62   63   64