Page 77 - GTM-3-1
P. 77
Global Translational Medicine Evaluating ML models for CAD prediction
focus on comparing and understanding the metrics of these which may reveal more intricate patterns and relationships
other comparable classifiers, and developing an ML model that are clinically relevant. In addition, it is imperative to
based on the above models. Finally, external validity of the explore the possibility of using a more granular output
model can also be tested by collecting patient information than a binary classification to capture the severity or stages
on patients with and without CAD diagnosis and seeing if of CAD. The study should also focus on testing the external
the model can predict whether the patients have CAD or not. validity of the model by applying it to independent datasets
The study demonstrates important strengths and from varied demographical and geographical backgrounds
limitations in its approach to applying ML for the to ensure the model’s predictions hold true across different
prediction of CAD. populations. Moreover, comparing other ML models such
as the ADA, GBC, and NB, which showed comparable
(i) Strengths results to logistic regression, could provide insights into
• The logistic regression model showcased in the study the optimal approach for predicting CAD. In the clinical
has proven to be particularly effective, with an AUC of context, model outputs should be integrated with clinical
0.88, indicating a high ability to differentiate between decision-making processes to evaluate their real-world
patients with and without CAD. effectiveness. It involves assessing not just the model’s
• The model successfully identified clinical features of predictive accuracy but also its impact on patient outcomes,
significant importance, such as exercise angina and cost-effectiveness, and user-friendliness for healthcare
chest pain type, as central to predicting CAD, aligning providers. By addressing these limitations, future research
with existing clinical knowledge and practices. can enhance the model’s predictive accuracy and increase
• The use of cross-validation techniques and a well- confidence in its clinical usefulness. Further research is
performing training curve suggests that the model has also essential to understand how such predictive models
good generalizability while avoiding overfitting. could be deployed in clinical workflows, balancing the
(ii) Limitations benefits of ML assistance with the expertise of health-care
• The dataset used for training the models was relatively professionals to optimize patient care outcomes.
small and combined from different resources, which 5. Conclusion
could impact the overall data quality and the model’s
predictive power due to institutional differences in The present study offers significant insights into the
data collection methods. application of ML for predicting CAD, with logistic
• A flat learning curve for the training score indicates regression emerging as a leading model featuring high
that the logistic regression model may not benefit discriminative capacity. The strength of logistic regression,
from additional data under its current configuration, underscored by an AUC of 0.88, lies in its ability to harness
suggesting potential limitations in dataset complexity key clinical features effectively, which is indicative of its
or model capacity. potential to support diagnostic decisions.
• Continuous variables in the dataset were converted This model also demonstrated a high overall accuracy
into binary outcomes, which may oversimplify clinical in predicting true CAD and distinguishing between TPs
nuances and reduce the subtlety with which the model and TNs. The most important features in the prediction
can identify patterns. of CAD included the presence of exertional angina,
• The end result of predicting the presence of CAD is also asymptomatic chest pain, and systolic blood pressure;
binary, without accounting for the disease’s severity or however, this could be explained by imbalances in the
progression, limiting the clinical applicability of the dataset itself that erroneously make the above factors seem
model’s outputs. more important. The study can be improved with more
• External validity is yet to be tested; the results observed robust data and the addition of other clinical modalities
are potentially specific to the dataset and may not and risk factors, such as imaging and smoking histories.
generalize to different populations without further Expanding the dataset to include more robust data and
validation. adding other clinical factors such as imaging results and
Moving forward, it would be beneficial to address these smoking histories may help improve the model’s predictive
limitations by expanding and diversifying the dataset to power. Future research could focus on refining and
include more patient records and a broader set of features, comparing other models such as the LDA, RIDGE, ADA,
including continuous variables and clinical modalities GBC, and NB, which had similar performance to the LR
such as imaging results and detailed smoking histories. By model. Validation on external datasets is also suggested
increasing the complexity and volume of the dataset, the to test the model’s external validity, which is necessary
robustness and sensitivity of the model could be improved, to ensure that the model can predict CAD accurately
Volume 3 Issue 1 (2024) 10 https://doi.org/10.36922/gtm.2669

