Page 77 - GTM-3-1
P. 77

Global Translational Medicine                                       Evaluating ML models for CAD prediction



            focus on comparing and understanding the metrics of these   which may reveal more intricate patterns and relationships
            other comparable classifiers, and developing an ML model   that are clinically relevant. In addition, it is imperative to
            based on the above models. Finally, external validity of the   explore the possibility of using a more granular output
            model can also be tested by collecting patient information   than a binary classification to capture the severity or stages
            on patients with and without CAD diagnosis and seeing if   of CAD. The study should also focus on testing the external
            the model can predict whether the patients have CAD or not.  validity of the model by applying it to independent datasets
              The study demonstrates important strengths and   from varied demographical and geographical backgrounds
            limitations in its approach to applying ML for the   to ensure the model’s predictions hold true across different
            prediction of CAD.                                 populations. Moreover, comparing other ML models such
                                                               as the ADA, GBC, and NB, which showed comparable
            (i) Strengths                                      results to logistic regression, could provide insights into
            •   The logistic regression model showcased in the study   the optimal approach for predicting CAD. In the clinical
               has proven to be particularly effective, with an AUC of   context, model outputs should be integrated with clinical
               0.88, indicating a high ability to differentiate between   decision-making processes to evaluate their real-world
               patients with and without CAD.                  effectiveness.  It  involves  assessing  not  just  the  model’s
            •   The model successfully identified clinical features of   predictive accuracy but also its impact on patient outcomes,
               significant importance, such as exercise angina and   cost-effectiveness, and user-friendliness for healthcare
               chest pain type, as central to predicting CAD, aligning   providers. By addressing these limitations, future research
               with existing clinical knowledge and practices.  can enhance the model’s predictive accuracy and increase
            •   The use of cross-validation techniques and a well-  confidence in its clinical usefulness. Further research is
               performing training curve suggests that the model has   also essential to understand how such predictive models
               good generalizability while avoiding overfitting.  could  be  deployed  in  clinical  workflows,  balancing  the
            (ii) Limitations                                   benefits of ML assistance with the expertise of health-care
            •   The dataset used for training the models was relatively   professionals to optimize patient care outcomes.
               small and combined from different resources, which   5. Conclusion
               could impact the overall data quality and the model’s
               predictive power due  to  institutional  differences  in   The present study offers significant insights into the
               data collection methods.                        application of ML for predicting CAD, with logistic
            •   A flat learning curve for the training score indicates   regression emerging as a leading model featuring high
               that the logistic regression model may not benefit   discriminative capacity. The strength of logistic regression,
               from additional data under its current configuration,   underscored by an AUC of 0.88, lies in its ability to harness
               suggesting potential limitations in dataset complexity   key clinical features effectively, which is indicative of its
               or model capacity.                              potential to support diagnostic decisions.
            •   Continuous  variables  in  the  dataset  were  converted   This model also demonstrated a high overall accuracy
               into binary outcomes, which may oversimplify clinical   in predicting true CAD and distinguishing between TPs
               nuances and reduce the subtlety with which the model   and TNs. The most important features in the prediction
               can identify patterns.                          of CAD included the presence of exertional angina,
            •   The end result of predicting the presence of CAD is also   asymptomatic chest pain, and systolic blood pressure;
               binary, without accounting for the disease’s severity or   however, this could be explained by imbalances in the
               progression, limiting the clinical applicability of the   dataset itself that erroneously make the above factors seem
               model’s outputs.                                more important. The study can be improved with more
            •   External validity is yet to be tested; the results observed   robust data and the addition of other clinical modalities
               are potentially specific to the dataset and may not   and risk factors, such as imaging and smoking histories.
               generalize to different populations without further   Expanding the dataset to include more robust data and
               validation.                                     adding other clinical factors such as imaging results and
              Moving forward, it would be beneficial to address these   smoking histories may help improve the model’s predictive
            limitations by expanding and diversifying the dataset to   power. Future research could focus on refining and
            include more patient records and a broader set of features,   comparing other models such as the LDA, RIDGE, ADA,
            including continuous variables and clinical modalities   GBC, and NB, which had similar performance to the LR
            such as imaging results and detailed smoking histories. By   model. Validation on external datasets is also suggested
            increasing the complexity and volume of the dataset, the   to test the model’s external validity, which is necessary
            robustness and sensitivity of the model could be improved,   to ensure that the model can predict CAD accurately


            Volume 3 Issue 1 (2024)                         10                       https://doi.org/10.36922/gtm.2669
   72   73   74   75   76   77   78   79   80   81   82