Page 61 - AIH-2-2
P. 61

Artificial Intelligence in Health                            Predicting ICU mortality: A stacked ensemble model



            the MIMIC-IV database with an equal number of survived   the model and an improvement in its ability to generalize
            and non-survived ones. The XGBoost model achieved   to new data, the risk of losing information hidden in less
            the highest accuracy (83.4%), while the model sensitivity   important variables that are displaced is real. 41
            was 82.2%, specificity 84.6% and AUC 0.918, indicating   Another interesting case with the stacked ensemble
            excellent discrimination. 36                       method is the combination of using traditional mortality

              This study employed a stacked ensemble learning   calculation systems such as APACHE with ML techniques.
            method, which mainly included CatBoost and Random   One such example is the study by Ren et al.  with data
                                                                                                    42
            Forests. This method resulted in a fairly high accuracy   from the Women in Data Science Datathon 2020 database,
            of 94% in predicting mortality in ICUs. Similar studies   and the MIMIC-III database. The total sample was over
            have applied the above method with equally good results   100,000 patients, of whom 83,798 survived and 7915 did not
            not only for mortality prediction but also for LOS or   survived. The stacked ensemble method model achieved
            probability of admission in the ICU. These outcomes were   the highest performance in metrics such as accuracy,
            of particular clinical and strategic importance during the   precision, recall, specificity, F -score, and AUC compared
                                                                                       1
            COVID-19 pandemic. In such a study of 956 patients in   to discrete models, such as LR, Naive Bayes, Random
            two Iranian hospitals, results showed that stacked ensemble   Forests, or XGBoost. However, the authors reported a
            models generally outperformed individual ML models in   significant problem with missing values (only 300 cases out
            predicting ICU admission and LOS.  However, in their   of the total sample had no missing data at all) that could
                                          37
            study, they showed that the XGBoost model performed   potentially affect accuracy and may introduce bias during
            slightly better than the results of the final stacked model. 37  model training. Moreover, there was a challenge with high
                       38
              Sun  et al.  also used the MIMIC-IV database with   dimensionality (the dataset contained 186 features), which
            a  subset  of  1,722  cardiac-arrest  patients  to  predict   may increase the complexity of the model, make it difficult
            in-hospital ICU mortality in this patient cohort. They   to interpret the features, and complicate the model with
                                                                                               42
            applied an ensemble method comparing models such as   redundant or highly correlated features.  Another crucial
            LASSO regression, XGBoost, and LR. The study compared   aspect to consider is the integration of ML algorithms into
            these models with the National Early Warning Score 2   Electronic Health Record (EHR) systems. The goal is not
            (NEWS  2) tool. This tool is an updated version developed   only to improve the accuracy and reliability of data entry,
            in 2017 by the Royal College of Physicians to detect and   processing, and analysis but also to retrieve safe and useful
                                                                                                           43,44
            manage clinical deterioration in adult patients.  The   conclusions and predictions about patient outcomes.
                                                     39
            ML models showed better prediction efficiency than the   Many different studies focus on this goal by leveraging
            NEWS2 model, with the LASSO model outperforming,   ML to automate EHR data analysis, extract causes of death
                                                                                            45,46
            with an AUC of 0.7879 and 0.7994 (in the training and   or  predict  risks  and  complications.    Researchers  have
                                                                                                            46
            validation sets, respectively). The authors claimed that   proposed techniques such as Natural Language Processing
            the findings are consistent with medical literature and   for  unstructured data analysis,  and new architectures
            highlighted the role of variables such as age, physiological   such as Model Cabinet Architecture for interoperability
            parameter scores (SAPS III), vital signs, and metabolic   improvement, continuous model training, and alerts and
                                                                                                 47
            parameters in determining patient outcome. 38      notifications to enhance decision-making.  Applications
                                                               for improving clinical care and research are impressive, but
              Research with the stacked ensemble method for    there is still a lot of research ground to cover. The difficulty
            predicting mortality in ICU patients is not adequate.   of adapting ML models to EHR workflows, understanding
            Several important studies that have been conducted in   and  overcoming  technical  limitations,  complying  with
            specialized patient cohorts do not show particularly high-  regulatory requirements, overcoming resistance from
            performance  metrics.  For  example,  in  one  such  recent   healthcare professionals, and the need for ongoing model
                         40
            study, Liu  et al.  used a subset of ICU patients with   evaluation are among the many barriers. 48
            sepsis-associated encephalopathy. The AUC was 0.807 and
            0.671 (in the training and validation set, respectively) and   6. Conclusion
            F -score of 0.486. Although the number  of data can be   This study demonstrates the efficacy of stacked ensemble
             1
            considered sufficient (9943 patients), they came from two   learning for predicting ICU mortality. We obtained
            different time periods and different databases (MIMIC-IV   remarkable results with an accuracy of 94.1882%, precision
            and eICU Collaborative Research Database).         of 94.0967%, recall of 94.2862%, and an F -score of
                                                                                                     1
              Liu  et al.  relied on a LASSO strategy to select the   94.1914% in a balanced dataset of 17,230 ICU patients by
                      40
            characteristics, excluding the least important ones.  Thus,   combining two CatBoost and one Random Forests model.
                                                    40
            although there may be a reduction in the complexity of   This stacked ensemble model underlines how combining
            Volume 2 Issue 2 (2025)                         55                               doi: 10.36922/aih.4981
   56   57   58   59   60   61   62   63   64   65   66