Page 55 - AIH-1-3
P. 55

Artificial Intelligence in Health                                  Predicting mortality in COVID-19 using ML



            decision-related problems, is prone to overfitting, especially   better overall performance. Furthermore, models that
            when interpreting each case in large-scale datasets. The   applied the “Min–Max” scaling with ranges between 1 –
            highest-scoring DT model demonstrates that using the   100 and 1 – 1000 to the numerical attributes of age and
            first set of optimized hyperparameters and the “Min–Max”   days from the symptom onset to hospitalization scored
            normalization method with a range of 0 to 1000 positively   higher than models using standard scaling or no scaling. In
            impacted its performance.                          addition, models using sets of 15 or 10 attributes exhibited
                                                               lower scores compared to the ones using all 22 attributes.
              The KNN models ranked fifth, with the highest scorer
            being “22_std_default.” This model achieved 92.85% in   Moreover, while most previous studies referenced
            precision, 92.95% in recall, 89.74% in F1-score, 95.93%   above utilized either more traditional ML models 38,40  or
                                                                                  42
            in AUC-ROC, and a runtime of 549.69155 s. The overall   only ensemble methods,  our study used both traditional
            performance of the KNN models can be attributed to   (LR, DTs, KNN, and MLPs) and ensemble (XGBoost
            the  design  of  the  KNN  algorithm,  which,  while  easy  to   and RF) methods to compare their performances. Other
            implement with few hyperparameters, struggles with large   studies have also shown promising results in predicting
            datasets due to its significant computing power and data   COVID-19 mortality by using blood biomarkers alongside
            storage requirements, making it both resource-exhausting   demographic and medical conditions, such as the studies
                                                                           41
            and  time-consuming.  The  highest-scoring  KNN  model   of Nassem et al.  and Rai et al. 39,42,44,45  that were mentioned
            indicates that using the “StandardScaler” normalization   in the previous sections.
            method and the default hyperparameter set, instead of the   An important limitation of our study is that the original
            optimized sets, contributed positively to its performance.  dataset originated from one country, Mexico. A deviation

              Finally, the LR models ranked sixth and last, with the   in ML models’ performance would likely occur if the
            highest overall scoring model being “22_ mm_0 – 1000_  dataset included patients from other countries, given the
            default.” This model achieved 92.63% in precision, 91.29%   differences in healthcare systems, medical care conditions,
            in recall, 89.26% in F1-score, 97.05% in AUC-ROC, and a   personal hygiene, etc. Another important limitation is
            runtime of 3.17691 s. The results of the LR models can be   that the original dataset consists mainly of categorical
            attributed to the nature of the algorithm, which, despite being   attributes. The ML models developed here would likely
            easier to implement, is limited by its assumption of linearity   show different performance if trained with datasets
            between dependent and independent variables—a condition   containing  different  compositions,  e.g.,  more  numerical
            rarely met in real-world data. The highest-performing LR   and continuous features.
            model suggests that using the default hyperparameter set   6. Conclusion
            and the “Min–Max” normalization method with a range of
            0 – 1000 had a positive impact on its performance.  The goal of this study was to create a dependable ML
                                                               model that can support medical facilities and hospitals
            5. Discussion                                      by predicting mortality outcomes in COVID-19 patients,
            In this study, data from four million COVID-19 patients   therefore assisting in their preliminary assessment during
            were processed and used as input for each of the 324   the pandemic. We processed a dataset containing a vast
            different ML models to predict the mortality outcomes of   number of COVID-19 patients using a large number of ML
            new patients. After the evaluation, it was clear that the ML   models created and trained by six ML methods, including
            models demonstrated high performance in all metrics, with   two ensemble methods. After evaluating all models, the
            precision reaching 93.76%. The top-performing model was   ML model with the highest score achieved a precision
            created using the XGBoost method, using all attributes, a   of 93.76%, a recall of 95.47%, an f1-score of 91.13%, an
            “Min–Max” scaler with a range of 0 to 100, and the first set   AUC-ROC 0.97855, and a runtime of 6.67306 s, using
            of optimized hyperparameters. After ranking all methods   patients’ demographics, pre-existing medical conditions,
            based on the highest overall score, the  models with the   and habits. This model can help medical experts identify
            best overall performance were produced by XGBoost,   COVID-19 patients at high risk of death by evaluating data
            followed by RF, MLPs, DTs, KNN, and LR, in descending   from questionnaires that report demographics, medical
            order of overall performance. This result indicates that   conditions, and other attributes listed in  Table 1. This
            the ensemble models of XGBoost and RF were the most   prioritization can ensure that the most vulnerable patients
            successful when applied to a dataset consisting mainly   receive priority treatment during periods of overwhelming
            of categorical attributes with only a few numerical ones.   demand on the national healthcare system.
            It was also observed that models using optimized sets of   Future work could explore the possibility of developing
            hyperparameters, instead of the default ones, displayed   an even higher-performance model by using an ensemble


            Volume 1 Issue 3 (2024)                         49                               doi: 10.36922/aih.2591
   50   51   52   53   54   55   56   57   58   59   60