Page 38 - AIH-1-3
P. 38

Artificial Intelligence in Health                                  Predicting mortality in COVID-19 using ML



            angiotensin-converting enzyme 2 (ACE2) receptor. The   and demographics. The remaining attributes were
            viral spike protein first attaches to ACE2, and then the   ranked according to their importance scores for each
            membrane enzyme TMPRSS2 cleaves the spike protein,   ML method, creating subsets of features with escalating
            exposing fusion peptides that facilitate fusion with the   cardinality to be used by different models. Data
            cell membrane.  SARS-CoV-2 is the ninth documented   processing and predictions for each patient’s outcome
                        7,8
            coronavirus to infect humans and the seventh identified   were achieved using models created with six different
            in  the  past  20  years.   Viruses  related  to  SARS-CoV-2   ML algorithmic methods, namely, logistic regression
                             9,10
                                                                                                            31
            have been documented in bats and pangolins in multiple   (LR), 27,28   decision  trees  (DTs), 29,30   random  forest  (RF),
                                                                                                  32
            locations in Southeast Asia, including China, Thailand,   eXtreme gradient boosting (XGBoost),  multi-layer
            Cambodia, and Japan, 11,12  with serological evidence of viral   perceptrons (MLPs), 33,34  and the k-nearest neighbors
            infections in pangolins for more than a decade.  Notably,   (KNN).  The main goal of the present study is to identify
                                                                     35
                                                  13
            SARS-CoV-2 is primarily transmitted between people   the most effective ML method for predicting COVID-19
                                                                                                            36
                                                                                                     36
            through close contact.                             mortality outcomes with the highest precision,  recall,
                                                               F1 score,  and area under the receiver-operator curve
                                                                      36
              The explosion in the number of infections and deaths        37
            has led to global efforts to control and curb the disease’s   (AUC-ROC).
            spread and associated mortality. One research field that   Specifically, this study aims to:
            has had a major positive impact on our understanding   (i).  Develop ML models for COVID-19 mortality outcome
            and  fighting the pandemic is machine learning (ML).   prediction
            ML was developed as a tool for data analysis and pattern   (ii). Conduct a comparative analysis of COVID-19 disease
                     14
            recognition.  ML algorithms process known data and    mortality  outcome  prediction  using  various  ML
            represent it in mathematical ways.  During the pandemic,   methods (LR, DTs, RF, XGBoost, MLPs, and KNN)
                                       14
            ML studies have assisted in diagnosing and predicting   (iii). Evaluate the performance of different ML algorithmic
            the  severity  of  illness  and  mortality  of  COVID-19, 15,16    methods used for the prediction of COVID-19
            predicting  future  mutations  of SARS-CoV-2,   and   mortality outcome.
                                                    17
            promoting the rapid development of therapeutic strategies
            such as effective vaccines  against the virus.     2. Related works
                                18
              The present study focuses on the prediction of   This section presents the main characteristics and
            COVID-19  patient mortality using risk factors such   outcomes of various studies conducted during the COVID-
            as health conditions, habits, and others. Many factors   19 pandemic with the aim of predicting the mortality
            increase  the  severity  of  COVID-19  disease,  which  may   outcome of COVID-19  patients. The data used in these
            even result in the death of the sufferer. A key risk indicator   studies varied from purely clinical markers, such as blood
            is age, as older people are more likely to get seriously ill   test results, to risk factors such as heart disease, obesity,
            from COVID-19. Over 81% of deaths from COVID-19 are   and diabetes included in the patient’s history. Sample sizes
            among people over the age of 65. The number of deaths   varied from several hundred to millions. Similarly, the
            among people aged over 65 years is 80 times higher than the   ML methods used in these studies varied, ranging from
            number of deaths among people aged 18 – 29 years.  There   simple classifiers such as LR, DTs, and KNN to ensemble
                                                    19
            are also medical conditions that increase the severity of the   techniques such as RF, gradient boosting machine (GBM),
            disease, such as heart disease,  type I and II diabetes, 20,21    and XGBoost.
                                    19
            chronic lung diseases,  and obesity.  In addition, smoking   Studies conducted at the beginning of the pandemic
                                        23
                             22
            can negatively affect the severity of COVID-19 illness,   that used only one ML method to train the models for
            as it  is one of  the risk factors  for the  development  and   predicting COVID-19 patient mortality include Josephus
            exacerbation of multiple respiratory diseases. 24,25  et al.  who used the LR method in a dataset of 485 patients,
                                                                   38
                                                                          39
              The data used in this study were provided by the   and Yan et al.  who used XGBoost models with a dataset
            Epidemiological Surveillance System for Respiratory   of 1,085 patients. Both studies reported an overall accuracy
            Diseases under the Directorate-General for Epidemiology   of 97% for their respective models. However, these studies
            of the Ministry of Health of the Government of Mexico.    were limited by their use of only one ML method for the
                                                         26
            The dataset consisted of over 12 million patients, each with   model training and relatively small sample sizes.
            40 attributes. Μany of the attributes, such as geographical   Pourhomayoun and Shakibi  used a variety of ML
                                                                                         40
            data about the patient and health facilities, were dropped   methods,  including  artificial  neural  networks  (ANNs),
            as redundant or irrelevant, retaining only those related   RF, DTs, support vector machine (SVM), KNN, and
            to pre-existing medical conditions, COVID-19 positivity,   LR, to predict mortality in COVID-19  patients. Their


            Volume 1 Issue 3 (2024)                         32                               doi: 10.36922/aih.2591
   33   34   35   36   37   38   39   40   41   42   43