Page 55 - AIH-1-3

P. 55

Artificial Intelligence in Health Predicting mortality in COVID-19 using ML

decision-related problems, is prone to overfitting, especially better overall performance. Furthermore, models that
when interpreting each case in large-scale datasets. The applied the “Min–Max” scaling with ranges between 1 –
highest-scoring DT model demonstrates that using the 100 and 1 – 1000 to the numerical attributes of age and
first set of optimized hyperparameters and the “Min–Max” days from the symptom onset to hospitalization scored
normalization method with a range of 0 to 1000 positively higher than models using standard scaling or no scaling. In
impacted its performance. addition, models using sets of 15 or 10 attributes exhibited
lower scores compared to the ones using all 22 attributes.
The KNN models ranked fifth, with the highest scorer
being “22_std_default.” This model achieved 92.85% in Moreover, while most previous studies referenced
precision, 92.95% in recall, 89.74% in F1-score, 95.93% above utilized either more traditional ML models 38,40 or
42
in AUC-ROC, and a runtime of 549.69155 s. The overall only ensemble methods, our study used both traditional
performance of the KNN models can be attributed to (LR, DTs, KNN, and MLPs) and ensemble (XGBoost
the design of the KNN algorithm, which, while easy to and RF) methods to compare their performances. Other
implement with few hyperparameters, struggles with large studies have also shown promising results in predicting
datasets due to its significant computing power and data COVID-19 mortality by using blood biomarkers alongside
storage requirements, making it both resource-exhausting demographic and medical conditions, such as the studies
41
and time-consuming. The highest-scoring KNN model of Nassem et al. and Rai et al. 39,42,44,45 that were mentioned
indicates that using the “StandardScaler” normalization in the previous sections.
method and the default hyperparameter set, instead of the An important limitation of our study is that the original
optimized sets, contributed positively to its performance. dataset originated from one country, Mexico. A deviation

Finally, the LR models ranked sixth and last, with the in ML models’ performance would likely occur if the
highest overall scoring model being “22_ mm_0 – 1000_ dataset included patients from other countries, given the
default.” This model achieved 92.63% in precision, 91.29% differences in healthcare systems, medical care conditions,
in recall, 89.26% in F1-score, 97.05% in AUC-ROC, and a personal hygiene, etc. Another important limitation is
runtime of 3.17691 s. The results of the LR models can be that the original dataset consists mainly of categorical
attributed to the nature of the algorithm, which, despite being attributes. The ML models developed here would likely
easier to implement, is limited by its assumption of linearity show different performance if trained with datasets
between dependent and independent variables—a condition containing different compositions, e.g., more numerical
rarely met in real-world data. The highest-performing LR and continuous features.
model suggests that using the default hyperparameter set 6. Conclusion
and the “Min–Max” normalization method with a range of
0 – 1000 had a positive impact on its performance. The goal of this study was to create a dependable ML
model that can support medical facilities and hospitals
5. Discussion by predicting mortality outcomes in COVID-19 patients,
In this study, data from four million COVID-19 patients therefore assisting in their preliminary assessment during
were processed and used as input for each of the 324 the pandemic. We processed a dataset containing a vast
different ML models to predict the mortality outcomes of number of COVID-19 patients using a large number of ML
new patients. After the evaluation, it was clear that the ML models created and trained by six ML methods, including
models demonstrated high performance in all metrics, with two ensemble methods. After evaluating all models, the
precision reaching 93.76%. The top-performing model was ML model with the highest score achieved a precision
created using the XGBoost method, using all attributes, a of 93.76%, a recall of 95.47%, an f1-score of 91.13%, an
“Min–Max” scaler with a range of 0 to 100, and the first set AUC-ROC 0.97855, and a runtime of 6.67306 s, using
of optimized hyperparameters. After ranking all methods patients’ demographics, pre-existing medical conditions,
based on the highest overall score, the models with the and habits. This model can help medical experts identify
best overall performance were produced by XGBoost, COVID-19 patients at high risk of death by evaluating data
followed by RF, MLPs, DTs, KNN, and LR, in descending from questionnaires that report demographics, medical
order of overall performance. This result indicates that conditions, and other attributes listed in Table 1. This
the ensemble models of XGBoost and RF were the most prioritization can ensure that the most vulnerable patients
successful when applied to a dataset consisting mainly receive priority treatment during periods of overwhelming
of categorical attributes with only a few numerical ones. demand on the national healthcare system.
It was also observed that models using optimized sets of Future work could explore the possibility of developing
hyperparameters, instead of the default ones, displayed an even higher-performance model by using an ensemble

Volume 1 Issue 3 (2024) 49 doi: 10.36922/aih.2591

50 51 52 53 54 55 56 57 58 59 60