Page 142 - EJMO-9-1
P. 142
Eurasian Journal of Medicine and
Oncology
Machine learning insights into heart failure outcomes
growing interest in using machine learning techniques Table 1. Summary of the dataset attributes along with their
to predict HF outcomes. These advanced methods descriptions and data types
2
have emerged as promising tools for the diagnosis, Attributes Information
classification, and prediction of HF, enabling the analysis
of large and diverse datasets to uncover complex patterns Age The patient’s age in years
and relationships that may not be detectable with Anemia A Boolean variable representing whether
traditional statistical methods. These approaches can there is a decrease in red blood cells or
3
hemoglobin
enhance our ability to predict the onset and severity of
HF. By leveraging machine learning algorithms, we can High blood pressure A Boolean variable representing whether the
4
patient has hypertension
potentially improve our ability to forecast HF progression,
optimize treatment strategies, and enhance patient care. Creatinine The concentration of the CPK enzyme in the
5
phosphokinase (CPK)
blood, measured in mcg/L
One of the main benefits of using machine learning for HF
prediction is its capacity to analyze diverse patient data, Diabetes A Boolean variable representing whether the
patient has diabetes.
including demographic, clinical, and laboratory variables,
to identify the key factors contributing to the development Ejection fraction The proportion of blood ejected from the
heart with each contraction and expressed as
6
and progression of the disease. Numerous studies a percentage.
have investigated the application of machine learning Platelets Platelet count in the blood, measured as
techniques, such as classification and regression tree, thousand platelets per milliliter (k/mL)
neural networks, and support vector machines (SVMs), Sex A binary variable indicating the patient’s
to pinpoint the most critical attributes in evaluating the gender (female or male)
severity of HF. For example, a recent study utilized these Serum creatinine Serum creatinine concentration in the blood,
7
7
three machine learning techniques to analyze data from measured in mg/dL
various sources and identify the most important attributes Serum sodium Serum sodium concentration in the blood,
for assessing the severity of HF. measured in mEq/L
This study aims to explore the clinical and demographic Smoking A Boolean variable indicating whether the
characteristics associated with HF outcomes using a patient is a smoker.
comprehensive dataset obtained from Kaggle. By analyzing Time Duration of follow-up in days
this dataset and employing machine learning algorithms,
we seek to identify significant predictors of death The first dataset was divided into features (X) and the
events among HF patients and evaluate the predictive target variable (Y), where the target variable represented
performance of different models. The findings of this the occurrence of death events (“DEATH EVENT”).
study could influence clinical decision-making, enhance The features comprised all columns except the “DEATH
patient care, and ultimately lead to improved outcomes for EVENT” column. This split facilitated the supervised
individuals with HF. learning process. In the second dataset, a correlation
2. Methods matrix was computed to understand the relationships
among different attributes. The correlation analysis aided in
The dataset utilized in this study was sourced from Kaggle, identifying potentially correlated variables, which guided
an open-source platform for sharing datasets and data the selection of characteristics for further investigation.
science projects. Specifically, the dataset named “Heart Following attribute selection, a random forest regressor
Failure Clinical Records” was accessed from the following model was utilized to assess feature importance. Feature
link: https://www.kaggle.com/datasets/whenamancodes/ importance scores were derived from the trained model,
8
heart-failure-clinical-records/data. Table 1 presents highlighting the most influential features in predicting
a summary of the dataset attributes along with their the target variable. Various machine learning algorithms
descriptions and data types. This dataset contains clinical were employed to model the relationship between the
records of patients diagnosed with HF and encompasses selected features and the target variable. The algorithms
various attributes related to demographic characteristics, used included logistic regression, decision tree, random
clinical indicators, and medical history. Missing values forest, SVM, and gradient boosting machine (GBM). These
within the dataset were handled using appropriate algorithms were selected based on their effectiveness in
techniques. For the first dataset, missing values were handling classification tasks. Each classifier was trained
imputed with the mean of each respective column using on the training dataset (X-train, Y-train) and evaluated
the fillna method from the panda’s library. This ensured on the testing dataset (X-test, Y-test). Model performance
data completeness and integrity for subsequent analysis. was assessed using accuracy scores computed with the
Volume 9 Issue 1 (2025) 134 doi: 10.36922/ejmo.6583

