Page 38 - AIH-1-3
P. 38
Artificial Intelligence in Health Predicting mortality in COVID-19 using ML
angiotensin-converting enzyme 2 (ACE2) receptor. The and demographics. The remaining attributes were
viral spike protein first attaches to ACE2, and then the ranked according to their importance scores for each
membrane enzyme TMPRSS2 cleaves the spike protein, ML method, creating subsets of features with escalating
exposing fusion peptides that facilitate fusion with the cardinality to be used by different models. Data
cell membrane. SARS-CoV-2 is the ninth documented processing and predictions for each patient’s outcome
7,8
coronavirus to infect humans and the seventh identified were achieved using models created with six different
in the past 20 years. Viruses related to SARS-CoV-2 ML algorithmic methods, namely, logistic regression
9,10
31
have been documented in bats and pangolins in multiple (LR), 27,28 decision trees (DTs), 29,30 random forest (RF),
32
locations in Southeast Asia, including China, Thailand, eXtreme gradient boosting (XGBoost), multi-layer
Cambodia, and Japan, 11,12 with serological evidence of viral perceptrons (MLPs), 33,34 and the k-nearest neighbors
infections in pangolins for more than a decade. Notably, (KNN). The main goal of the present study is to identify
35
13
SARS-CoV-2 is primarily transmitted between people the most effective ML method for predicting COVID-19
36
36
through close contact. mortality outcomes with the highest precision, recall,
F1 score, and area under the receiver-operator curve
36
The explosion in the number of infections and deaths 37
has led to global efforts to control and curb the disease’s (AUC-ROC).
spread and associated mortality. One research field that Specifically, this study aims to:
has had a major positive impact on our understanding (i). Develop ML models for COVID-19 mortality outcome
and fighting the pandemic is machine learning (ML). prediction
ML was developed as a tool for data analysis and pattern (ii). Conduct a comparative analysis of COVID-19 disease
14
recognition. ML algorithms process known data and mortality outcome prediction using various ML
represent it in mathematical ways. During the pandemic, methods (LR, DTs, RF, XGBoost, MLPs, and KNN)
14
ML studies have assisted in diagnosing and predicting (iii). Evaluate the performance of different ML algorithmic
the severity of illness and mortality of COVID-19, 15,16 methods used for the prediction of COVID-19
predicting future mutations of SARS-CoV-2, and mortality outcome.
17
promoting the rapid development of therapeutic strategies
such as effective vaccines against the virus. 2. Related works
18
The present study focuses on the prediction of This section presents the main characteristics and
COVID-19 patient mortality using risk factors such outcomes of various studies conducted during the COVID-
as health conditions, habits, and others. Many factors 19 pandemic with the aim of predicting the mortality
increase the severity of COVID-19 disease, which may outcome of COVID-19 patients. The data used in these
even result in the death of the sufferer. A key risk indicator studies varied from purely clinical markers, such as blood
is age, as older people are more likely to get seriously ill test results, to risk factors such as heart disease, obesity,
from COVID-19. Over 81% of deaths from COVID-19 are and diabetes included in the patient’s history. Sample sizes
among people over the age of 65. The number of deaths varied from several hundred to millions. Similarly, the
among people aged over 65 years is 80 times higher than the ML methods used in these studies varied, ranging from
number of deaths among people aged 18 – 29 years. There simple classifiers such as LR, DTs, and KNN to ensemble
19
are also medical conditions that increase the severity of the techniques such as RF, gradient boosting machine (GBM),
disease, such as heart disease, type I and II diabetes, 20,21 and XGBoost.
19
chronic lung diseases, and obesity. In addition, smoking Studies conducted at the beginning of the pandemic
23
22
can negatively affect the severity of COVID-19 illness, that used only one ML method to train the models for
as it is one of the risk factors for the development and predicting COVID-19 patient mortality include Josephus
exacerbation of multiple respiratory diseases. 24,25 et al. who used the LR method in a dataset of 485 patients,
38
39
The data used in this study were provided by the and Yan et al. who used XGBoost models with a dataset
Epidemiological Surveillance System for Respiratory of 1,085 patients. Both studies reported an overall accuracy
Diseases under the Directorate-General for Epidemiology of 97% for their respective models. However, these studies
of the Ministry of Health of the Government of Mexico. were limited by their use of only one ML method for the
26
The dataset consisted of over 12 million patients, each with model training and relatively small sample sizes.
40 attributes. Μany of the attributes, such as geographical Pourhomayoun and Shakibi used a variety of ML
40
data about the patient and health facilities, were dropped methods, including artificial neural networks (ANNs),
as redundant or irrelevant, retaining only those related RF, DTs, support vector machine (SVM), KNN, and
to pre-existing medical conditions, COVID-19 positivity, LR, to predict mortality in COVID-19 patients. Their
Volume 1 Issue 3 (2024) 32 doi: 10.36922/aih.2591

