Page 143 - EJMO-9-1
P. 143
Eurasian Journal of Medicine and
Oncology
Machine learning insights into heart failure outcomes
“accuracy_score,” function from the scikit-learn metrics
module. The methodology employed in this study, as
depicted in Figure 1, encompassed comprehensive data
preprocessing, feature importance analysis, correlation
matrix computation, and machine learning model
implementation for predicting death events among HF
patients.
The verification process involved splitting the dataset
into training and testing subsets. Specifically, after
preprocessing the data, 80% of the dataset was utilized for
training the machine learning models, and the remaining
20% was reserved for testing. The models, including logistic
regression, random forest, GBM, and others, were trained
using the training subset to learn patterns and relationships
between the features and the target variable (“DEATH
EVENT”). Verification was performed using the testing
dataset, which was not involved in training to ensure an
unbiased assessment. The performance of each model was
evaluated against real clinical outcomes recorded in the
dataset. Key metrics, such as accuracy, precision, recall, Figure 1. Flowchart illustrating the methodology used in the study,
F1-score, and the area under the curve of the receiver entailing data preprocessing, feature importance analysis, correlation
operating characteristic (AUC-ROC) curve, were used matrix computation, and machine learning model implementation for
predicting death events among heart failure patients
to quantify predictive performance. Confusion matrices
further provided detailed insights into true positives, false
positives, true negatives, and false negatives, offering a Table 2. Selected attributes based on the correlation with
“Death Event”
nuanced understanding of each model’s ability to correctly
identify death events. Attributes Feature importance scores
Time 0.35
3. Results
Serum creatinine 0.14
3.1. Feature importance analysis Ejection fraction 0.12
The results of the study are summarized in Table 2, which Platelets 0.082
presents the selected attributes based on correlation Creatinine phosphokinase 0.079
with the target variable “DEATH EVENT” and their Age 0.077
corresponding feature importance scores obtained from Serum sodium 0.073
a random forest regressor model. The table provides a Anemia 0.013
comprehensive overview of the relative importance of
each attribute in predicting the occurrence of death events Sex 0.013
among HF patients. Table 2 illustrates that the attribute Smoking 0.012
“time” exhibited the highest feature importance score High blood pressure 0.011
of 0.356, emphasizing its significant predictive power Diabetes 0.011
in forecasting outcomes. This is followed by “serum
creatinine” with a score of 0.142 and “ejection fraction” with
a score of 0.127. These findings underscore the importance Moreover, the inclusion of categorical variables, such as
of longitudinal follow-up duration, renal function, and “anemia,” “sex,” “smoking,” “high blood pressure,” and
cardiac function as key predictors of adverse outcomes “diabetes,” in the analysis further enriches the predictive
in HF patients. In addition, attributes such as “platelets,” model. While these variables exhibit relatively lower
“creatinine phosphokinase,” and “age” also demonstrated feature importance scores than the formerly mentioned
notable feature importance scores, indicating their variables, their contributions to the overall predictive
relevance in prognostic modeling. These physiological performance should not be overlooked. Overall, the results
and demographic factors contribute valuable insights into presented in Table 2 highlight the utility of a data-driven
risk stratification and treatment planning for HF patients. approach in identifying clinically relevant predictors of
Volume 9 Issue 1 (2025) 135 doi: 10.36922/ejmo.6583

