Page 40 - AIH-1-3
P. 40

Artificial Intelligence in Health                                  Predicting mortality in COVID-19 using ML



            ended up with 3,809,119 COVID-19  patients with valid   the target attribute used for predictions, and the remaining
            attribute values. Fourth, we removed the non-correlated   four being the “Registration ID” and three indicators. The
            attributes – for instance, those related to geography,   created “gold standard dataset” (Table 2) consisting of the
            residency, and indigeneity – resulting in 24 attributes.   23 attribute values of the 3,809,119 patients was saved in
            Finally, we transformed the “Date of Death” attribute   CSV file format. The dataset cleansing process flowchart
            into a categorical attribute by renaming it “Survived”   is shown in  Figure  1. The dataset value distribution for
            and replacing all the “9999-99-99” date values with “1”   the categorical attributes is depicted in Figure 2, whereas
            (Yes) and the rest with “0” (No). We also combined the   Figure 3 illustrates the distribution of age groups for both
            “Symptom Onset Date” and “Hospital Admission Date”   genders. In Figure 3, we observe that most of the patients
            attributes into a numerical attribute labeled “Days from
            Symptom to Hospitalization.” Hence, we settled with 23
            attributes, as shown in Table 1. Nineteen of these attributes   Table 2. The golden standard dataset table
            were included in the ML models, with “Survived” being   Attribute name        Value distribution
                                                               Registration ID     3,809,119 unique values
            Table 1. The 23 attributes of each patient         Sex                 1,921,058: Female; 1,888,061: Male
                                                               Age                 Mean: 40.723; Standard deviation:
            Attribute name                  Values                                 17.173; Minimum: 0; Maximum: 121
            Registration ID       Patient’s unique identification code  Smoker     250,558: Yes; 3,558,561: No
            Sex                   1: Female; 2: Male           Pneumonia           395,528: Yes; 3,413,591: No
            Age                   Numerical positive           Diabetes            403,336: Yes; 3,405,783: No
            Smoker                1:Yes; 2:No                  Obesity             448,412: Yes; 3,360,707: No
            Pneumonia             1:Yes; 2:No                  COPD                31,477: Yes; 3,777,642: No
            Diabetes              1:Yes; 2:No                  Asthma              74,682: Yes; 3,734,437: No
            Obesity               1:Yes; 2:No                  Immunosuppressed    23,825: Yes; 3,785,294: No
            COPD                  1:Yes; 2:No                  Hypertension        526,891: Yes; 3,282,228: No
            Asthma                1:Yes; 2:No                  Cardiovascular disease  44,362: Yes; 3,764,757: No
            Immunosuppressed      1:Yes; 2:No                  Chronic kidney failure  42,703: Yes; 3,766,416: No
            Hypertension          1:Yes; 2:No                  Other chronic disease  60,307: Yes; 3,748,812: No
            Cardiovascular disease  1:Yes; 2:No                Pregnancy           29,332: Yes; 1,891,726: No; 1,888,061:
            Chronic kidney failure  1:Yes; 2:No                                    Not applicable (Male)
            Other chronic disease  1:Yes; 2:No                 Contact with        1,519,968:Yes; 2,289,151: No
                                                               COVID-19 case
            Pregnancy             1:Yes; 2:No; 97: Not applicable
                                  (Male)                       Laboratory result   3,802,238: SARS-CoV-2 positive; 0:
                                                                                   SARS-CoV-2 negative; 6,881: Not clear
            Contact with COVID-19 case a  1:Yes; 2:No
                                                               Final classification  3,809,119: Confirmed cases; 0: Invalidly
            Laboratory result a   1: SARS-CoV-2 positive; 2:                       identified cases; 0: Unconfirmed cases
                                  SARS-CoV-2 negative; 3, 4: Not clear
                                                               Patient type        3,284,671: Not admitted; 524,448:
            Final classification a,b  1, 2, 3: Confirmed case; 4: Invalidly        Admitted
                                  identified case; 5, 6, 7: Unconfirmed
                                  case                         Intubated           60,539: Yes; 463,909: No; 3,284,671: Not
                                                                                   applicable
            Patient type          1: Not admitted; 2: Admitted
                                                               ICU                 42,095: Admitted to ICU; 482,353:
            Intubated             1:Yes; 2:No; 97: Not applicable                  Not admitted to ICU; 3,284,671: Not
            ICU                   1: Admitted to ICU; 2:Not admitted               applicable
                                  to ICU; 97: Not applicable   Days from symptom to   Mean: 3.673; Standard deviation: 3.160;
            Days from symptom to   Numerical positive (created   hospitalization   Minimum: -13 ; Maximum: 43
                                                                                             a
            hospitalization c     attribute)                   Survived            3,558,390: Yes; 250729: No
            Survived b            1:Yes; 2:No (created attribute)  Note:  The minimum value is a negative number in the cases where the
                                                                   a
            Notes:  Indicators;  COVID-19 sample classification;  Created   patient contacted the disease inside the hospital where he was being
                                              c
                 a
                        b
            attributes.                                        treated.
            Abbreviations: COPD: Chronic obstructive pulmonary disease; ICU:   Abbreviations: COPD: Chronic obstructive pulmonary disease; ICU:
            Intensive care unit.                               Intensive care unit.
            Volume 1 Issue 3 (2024)                         34                               doi: 10.36922/aih.2591
   35   36   37   38   39   40   41   42   43   44   45