Page 56 - AIH-2-2
P. 56

Artificial Intelligence in Health                            Predicting ICU mortality: A stacked ensemble model



            LightGBM, and CatBoost are all powerful gradient   SHAP algorithms specifically designed for ensemble and
            boosting algorithms known for their superior performance   tree-based models, we were able to accurately calculate
            when compared to Random Forests, Decision Trees, and   the SHAP values associated with each feature. This enables
            GBTs.  XGBoost’s robust performance stems from careful   us to determine the most significant contributing features
                 27
            tree  pruning  and  regularization  techniques,   making  it   as  well  as their level of  relevance  based  on  their  SHAP
                                                22
            a good candidate for complex datasets. LightGBM, on   values. Furthermore, a detailed comprehension of how
            the other hand, distinguishes itself in terms of speed and   these features interact and affect the model’s predictions
            efficiency. It employs Gradient-based One-Side Sampling   is provided by the bee swarm plot’s ability to show the
            and Exclusive Feature Bundling algorithms, enabling faster   distribution of SHAP values across all instances. By
            training times while maintaining high accuracy through   examining the SHAP values for each feature, we can gain
            its  GBT  methodology.   CatBoost  shines  in  its  ability  to   a deeper understanding of which factors have the greatest
                              28
            handle categorical features directly without the need for   impact on patient outcomes, ultimately informing more
            taking into consideration external pre-processing steps. Its   effective clinical decision-making.
            efficacy stems from its algorithm, which integrates ordered   The  aforementioned  organizational  structure
            boosting and symmetrical tree learning to effectively   contributed to the identification of the optimal candidates
            address potential overfitting challenges.  This leads to   to be integrated into a powerful final ensemble model.
                                             23
            models with high generalization ability.           After careful evaluation of the metrics, the architecture
              To gain deeper insights into the dataset and the trained   of the final model derived following ensemble learning.
                                                                                                            26
            models, their respective metrics were saved in discrete   Opting for this strategy proved highly effective, offering
            folders, each corresponding to a specific model. This   the  best overall  metrics,  with  the  combination  of  three
            approach allowed the identification of the most effective   models. The winning combination consisted of a stacked
            algorithms for mortality prediction on the given dataset.   ensemble learning model,  employing the CatBoost and
                                                                                    29
            Indicatively, Figure 1 demonstrates the feature importance   the Random Forests algorithms.
            analysis based on a specific CatBoost algorithm.     Stacked ensemble learning is a hierarchical ML
              The SHAP beeswarm plot in  Figure  2  provides a   technique that combines multiple steps, as depicted in
            detailed, feature-level analysis of the most significant   Figure  3. Initially, multiple  discrete  models (Level-0
            indexes for ICU mortality, focusing on how individual   models), are trained independently on the given dataset.
            input features contribute to the model output. By   Subsequently, the Level-0 model predictions are used to
            leveraging the TreeExplainer, which makes use of the Tree   generate a new input dataset that is needed to train the





























            Figure 1. Feature importance analysis based on the CatBoost classifier with the greatest weight of the final model
            Abbreviations: AIDS: Acquired immune deficiency syndrome; BSL: Blood sugar level; BUN: Blood urea nitrogen; FiO : Fraction of inspired oxygen;
                                                                                         2
            GCS: Glasgow Comma Scale; MAP: Mean arterial pressure; PaO : Partial pressure of oxygen; PaCO : Partial Pressure of Carbon Dioxide; WBC: White
                                                                            2
                                                      2
            blood cell count.
            Volume 2 Issue 2 (2025)                         50                               doi: 10.36922/aih.4981
   51   52   53   54   55   56   57   58   59   60   61