Page 55 - AIH-2-2
P. 55
Artificial Intelligence in Health Predicting ICU mortality: A stacked ensemble model
Table 1. Dataset variables
Categorical Gender Intubation Readmission Emergency surgery Age Lymphoma
GCS (Eyes response) GCS (Verbal response) GCS (Motor response) Operative/Non-operative Immunosuppression AIDS
Hepatic failure Metastatic cancer Leukemia Cirrhosis Thrombolysis Dialysis
Numerical Hematocrit Albumin Temperature Heart rate Respiratory rate FiO
2
PaO PaCO Arterial pH Na (sodium) Urine output Creatinine
+
2 2
Mean arterial Blood urea nitrogen White blood cell count Blood sugar level Bilirubin
pressure
Output ICU mortality
Abbreviations: AIDS: Acquired immune deficiency syndrome; FiO : Fraction of inspired oxygen; GCS: Glasgow Comma Scale; ICU: Intensive care unit;
2
PaO : Partial pressure of oxygen; PaCO : Partial Pressure of Carbon Dioxide.
2 2
26
imputations, we aimed to ensure that the output column specific performance metrics of ensemble learning,
of the dataset contained non-empty values; reducing ultimately leading to the development of effective model
the available dataset records to 148,532 ICU patients for combinations. The final model was trained in 127.82 s
further pre-processing (139,917 survived and 8,615 non- using an Apple M1 Max 32-Core GPU with 32 GB of
survived). unified RAM.
To address the high-class imbalance in the dataset, we 3.1. Development of ML models
undersampled the majority class (survived) during model
training using random selection. This approach aimed The purpose of this research was to develop several robust
to prevent bias toward survival and ensure the model models capable of predicting patients’ ICU mortality.
Typically, the first step was data pre-processing and
accurately estimates the probability of ICU patient mortality
by fine-tuning the respective model hyperparameters. In hyperparameters’ tuning by employing various strategies.
that way, we developed a balanced dataset resulting in a The developed model exploited the native pre-processing
dataset of 17,230 patients with an equal distribution of capabilities of each distinct algorithm. It utilized feature
selection, data scaling, categorical variable encoding,
survived and non-survived patients – 8615 records each.
and feature importance analysis techniques which were
Undersampling can resolve bias toward the minority tailored to the architecture of each model. Through
class and offers significant computational advantages by refinement, the most effective training processes were
reducing training time and memory demands. It can also determined, maximizing the predictive power of the
reduce the model’s complexity and has the potential to algorithms. This enabled the determination of the most
improve interpretability. This may also reduce noise within suitable settings for each algorithm’s key hyperparameters,
the minority class and enhance generalization performance toward the direction of the optimal performance. Through
by focusing on more valuable data points. 19 this iterative optimization process, the performance of our
models was optimized, the bias was mitigated, and the
3. Methodology generalization ability of the model was certified on novel,
Various ML and deep learning algorithms were employed first-time-seen data.
in this research, such as Decision Trees, Random Forests, As it has been already mentioned, several ML algorithms
20
Extra Trees, XGBoost, CatBoost, Light Gradient- were considered. This paper focuses on the most robust
22
23
21
Boosting Machine (LightGBM), and Neural Networks. ones, namely Decision Trees, gradient-boosted trees
25
24
Moreover, various ensemble learning algorithms were used. (GBTs), and Random Forests. Decision trees can be easily
To ascertain the optimal architecture, a comprehensive interpreted, although they are often prone to overfitting.
model evaluation was undertaken. GBTs, on the other hand, mitigate this by sequentially
This process involved leveraging the default feature building ensembles of trees, while Random Forests
handling capabilities of each ML model. The hyperparameter combine multiple decision trees through diverse subsets of
values were carefully optimized and defined during the data and input features, leading to a better generalization
training phase to maximize performance, as described and accuracy of the respective model.
in more detail in the subsection below. The developed Moreover, we took into consideration additional
models were evaluated by considering the values of contemporary and state-of-the-art algorithms. XGBoost,
Volume 2 Issue 2 (2025) 49 doi: 10.36922/aih.4981

