Page 142 - EJMO-9-1
P. 142

Eurasian Journal of Medicine and
            Oncology
                                                                        Machine learning insights into heart failure outcomes


            growing interest in using machine learning techniques   Table 1. Summary of the dataset attributes along with their
            to predict HF  outcomes.  These advanced  methods   descriptions and data types
                                  2
            have emerged as promising tools for the diagnosis,   Attributes     Information
            classification, and prediction of HF, enabling the analysis
            of large and diverse datasets to uncover complex patterns   Age     The patient’s age in years
            and relationships that may not be detectable with   Anemia          A Boolean variable representing whether
            traditional  statistical  methods.   These  approaches  can         there is a decrease in red blood cells or
                                      3
                                                                                hemoglobin
            enhance our ability to predict the onset and severity of
            HF.  By leveraging machine learning algorithms, we can   High blood pressure  A Boolean variable representing whether the
               4
                                                                                patient has hypertension
            potentially improve our ability to forecast HF progression,
            optimize treatment strategies, and enhance patient care.    Creatinine   The concentration of the CPK enzyme in the
                                                          5
                                                               phosphokinase (CPK)
                                                                                blood, measured in mcg/L
            One of the main benefits of using machine learning for HF
            prediction is its capacity to analyze diverse patient data,   Diabetes  A Boolean variable representing whether the
                                                                                patient has diabetes.
            including demographic, clinical, and laboratory variables,
            to identify the key factors contributing to the development   Ejection fraction  The proportion of blood ejected from the
                                                                                heart with each contraction and expressed as
                                         6
            and progression of the disease.  Numerous studies                   a percentage.
            have investigated the application of machine learning   Platelets   Platelet count in the blood, measured as
            techniques, such as classification and regression tree,             thousand platelets per milliliter (k/mL)
            neural networks, and support vector machines (SVMs),   Sex          A binary variable indicating the patient’s
            to pinpoint the most critical attributes in evaluating the          gender (female or male)
            severity of HF.  For example, a recent study  utilized these   Serum creatinine  Serum creatinine concentration in the blood,
                       7
                                               7
            three machine learning techniques to analyze data from              measured in mg/dL
            various sources and identify the most important attributes   Serum sodium  Serum sodium concentration in the blood,
            for assessing the severity of HF.                                   measured in mEq/L
              This study aims to explore the clinical and demographic   Smoking  A Boolean variable indicating whether the
            characteristics associated with HF outcomes using a                 patient is a smoker.
            comprehensive dataset obtained from Kaggle. By analyzing   Time     Duration of follow-up in days
            this dataset and employing machine learning algorithms,
            we seek to identify significant predictors of death   The first dataset was divided into features (X) and the
            events among HF patients and evaluate the predictive   target variable (Y), where the target variable represented
            performance of different models. The findings of this   the occurrence of death events (“DEATH EVENT”).
            study  could  influence  clinical  decision-making,  enhance   The features comprised all columns except the “DEATH
            patient care, and ultimately lead to improved outcomes for   EVENT” column. This split facilitated the supervised
            individuals with HF.                               learning process. In the second dataset, a correlation
            2. Methods                                         matrix was computed to understand the relationships
                                                               among different attributes. The correlation analysis aided in
            The dataset utilized in this study was sourced from Kaggle,   identifying potentially correlated variables, which guided
            an open-source platform for sharing datasets and data   the selection of characteristics for further investigation.
            science projects. Specifically, the dataset named “Heart   Following  attribute selection,  a random  forest  regressor
            Failure Clinical Records” was accessed from the following   model was utilized to assess feature importance. Feature
            link: https://www.kaggle.com/datasets/whenamancodes/  importance scores were derived from the trained model,
                                       8
            heart-failure-clinical-records/data.   Table 1 presents   highlighting the most influential features in predicting
            a summary of the dataset attributes along with their   the target variable. Various machine learning algorithms
            descriptions and data types. This dataset contains clinical   were employed to model the relationship between the
            records of patients diagnosed with HF and encompasses   selected features and the target variable. The algorithms
            various attributes related to demographic characteristics,   used included logistic regression, decision tree, random
            clinical indicators, and medical history. Missing values   forest, SVM, and gradient boosting machine (GBM). These
            within the dataset were handled using appropriate   algorithms were selected based on their effectiveness in
            techniques.  For  the  first  dataset,  missing  values  were   handling classification tasks. Each classifier was trained
            imputed with the mean of each respective column using   on the training dataset (X-train, Y-train) and evaluated
            the fillna method from the panda’s library. This ensured   on the testing dataset (X-test, Y-test). Model performance
            data completeness and integrity for subsequent analysis.   was assessed using accuracy scores computed with the


            Volume 9 Issue 1 (2025)                        134                              doi: 10.36922/ejmo.6583
   137   138   139   140   141   142   143   144   145   146   147