Page 54 - AIH-2-2
P. 54
Artificial Intelligence in Health Predicting ICU mortality: A stacked ensemble model
increasingly accurate predictions, solving many of the between healthcare providers and patients. However,
14
problems of traditional models. 1 a review of the current literature reveals that studies
Traditional scoring models such as Acute Physiology focusing on complex algorithm models or advanced ML
and Chronic Health Evaluation (APACHE) and Simplified methodologies for ICU data remain scarce. 1,9,17 Overall, it
Acute Physiology Score (SAPS) have long been used to appears that the above points of options and techniques,
assess patient outcomes, and until recently, newer versions coupled with the potential of novel algorithms and ML
have been developed to improve these predictions. combination methods, offer a promising pathway toward
2,3
Despite these efforts, insurmountable limitations have ICU mortality prediction.
been identified, such as their static chronological nature, The aim of this research is to use the stacked ensemble
their dependence on pre-defined variables, and static model to improve the accuracy of predicting patient
patient sets, which limit their adaptability and challenges mortality in the ICU. The use of the stacked ensemble
1,4
due to their heterochronicity toward rapid changes in model combining several ML models is a relatively
technological, and clinical developments. In particular, new approach, as it takes advantage of the strengths of
5
their inability to integrate data in real-time has been different algorithms. This allows for better understanding
highlighted, as it prevents them from directly handling and utilization of the data while enhancing the overall
complex interactions between variables, thereby hindering performance of the prediction system. Despite its
adaptation to emerging medical developments and the application already in relevant studies, the advantages of
constantly changing data patterns that arise from them. this approach in improving patient mortality in the ICU
6
In addition, the dependence of outcomes in ICU on have not been well documented.
important variations (different available resources, patient
demographics, medical practices) requires continuous 2. Dataset description
recalibration of these traditional models and frequently This study utilized the electronic ICU (eICU) Collaborative
local validation to maintain their accuracy and reliability Research Database, a publicly available resource
7
18
which is not always feasible. containing anonymized health data from over 200,000
The incorporation of newer technologies and ICU patients across the United States in the period
methodologies, such as AI and ML, has been suggested to 2014 – 2015. This database encompasses a wide range of
lead to new paths for the accuracy and predictive ability clinical information, including vital signs, demographics,
of ICU data and their clinical utility. Especially, models laboratory results, medications, as well as mortality
8
that include CatBoost, Feedforward Neural Networks, outcomes.
etc., which have been used by analyzing extensive datasets, To align our prediction system with existing ICU
reveal complex patterns that traditional scoring systems protocols and practices, we built a framework comparable
may miss. In fact, there are several studies that promise to the actively used APACHE IV system. We employed a
9,10
or have highlighted the superiority of ML models of tailored dataset derived from specific eICU database tables,
mortality prediction accuracy in ICUs, compared to using the same specific features as this established system.
traditional estimation methods. 11,12
Given the sensitive nature of the clinical input features,
However, despite their promising advantages—at
least in analysis time and higher accuracy—several we prioritized ethical considerations by avoiding manual
imputation techniques. Directly filling missing values in
challenges remain: (a) biases in data collection and sample specific clinical parameters carried significant risks, which
representativeness, (b) the need for strong validation and could lead next to potentially harmful misinterpretations.
generalizability of results, (c) interpretability, and
9,13
14
(d) the complexity of ML models that often generate While some patient records contained missing values in
15
suspicion among clinicians to trust the predictions, are specific input features, we strategically decided to retain
them, aiming to enhance the model’s generalization
some of the most important challenges. Various methods, ability and performance by exposing it to further and
such as SHapley Additive exPlanations (SHAP) and Local more complex data patterns. Therefore, we avoided any
Interpretable Model-agnostic Explanations, have been
proposed to address these concerns. The prevailing manual pre-processing techniques, such as synthetic data
16
trend in international literature seems to be increasingly generation or handling of missing values.
pressing for the highest possible prognosing accuracy As shown in Table 1, the final dataset comprises 35
and to address the above-mentioned challenges. Future features (18 categorical, 17 numerical) with a single binary
research needs to focus on incorporating multicenter output variable, representing the mortality outcome of
datasets and improving model transparency to build trust a patient during their ICU stay. In addition to avoiding
Volume 2 Issue 2 (2025) 48 doi: 10.36922/aih.4981

