Page 93 - GTM-4-3
P. 93

Global Translational Medicine                          CNNs for overfitting and generalizability in fracture detection



            studies do not demonstrate that their models are properly   Table 1. Summary of challenges in automated bone fracture
            converged and well validated, it becomes difficult to trust   detection and the corresponding solutions implemented in
            their reported metrics. This is because overfitting may   this study
            occur, where the model fits too closely to the training data   Challenge  How it is addressed
            and fails to generalize to new, unseen data. Without robust   Limited annotated   Curated a comprehensive dataset of 4,900 X-ray
            validation practices, such as using a separate validation set   data in the field  images, carefully preprocessed and standardized
            to monitor model performance during training, overfitting         to ensure high-quality input for the model.
            can remain undetected. Consequently, the reported high   Overfitting and   Developed an advanced CNN architecture with
            accuracy may not reflect the model’s true performance   suboptimal CNN   convolutional layers for feature extraction, batch
            in real-world applications, undermining the reliability of   architecture design  normalization, rectified linear unit activations,
            the study’s findings. 19-21  Therefore, it is essential for studies   and dropout layers to mitigate overfitting.
            to adopt proper validation strategies to ensure that their   Skewed dataset splits  Employed k-fold cross-validation to evaluate
            models generalize well and that their reported metrics are   affecting reliability  model performance across multiple data splits,
            trustworthy.                                                      ensuring reliable and robust performance
                                                                              metrics.
              Furthermore, many AI models, especially deep learning   Lack of   Performed external validation on an
            models, are often considered “black boxes,” making it   generalizability to   independent dataset, confirming the model’s
            difficult to interpret their decision-making process. This   unseen data  high accuracy and applicability in diverse
            lack of transparency can hinder clinical adoption. 22-24          clinical settings.
            Integrating AI tools into existing clinical workflows can   Ensuring high   Achieved high validation accuracy and test
            be challenging, requiring changes in how radiologists and   performance  accuracy, confirmed with testing on external
                                                                              data, along with high sensitivity and specificity,
            other healthcare professionals operate. There is a need for       showcasing strong reliability for clinical use.
            standardized performance metrics to evaluate and compare   Integration into   Highlighted the importance of clinical
            different AI models effectively. Current studies often use   clinical workflows  integration, with future work focusing on
            varied metrics, making it difficult to assess their relative      assessing the system’s impact on diagnostic
            performance. 13,25  Table 1 summarizes the key challenges in      accuracy and efficiency in real-world settings.
            automated bone fracture detection and the corresponding   Note: The table highlights the limitations addressed, including dataset
            solutions implemented in this study to address these issues.  preparation, overfitting, generalizability, and methods to ensure high
                                                               performance and clinical integration.
              By addressing critical aspects of automated bone   Abbreviation: CNN: Convoluted neural network.
            fracture detection, this study provides insights into the
            development of reliable AI systems for medical imaging.   protocols, this study seeks to create a robust model that
            It  introduces  an  ML  model  that  demonstrates  strong   maintains high diagnostic performance when applied to
            performance and highlights its potential for clinical   diverse clinical scenarios and external datasets.
            application through rigorous validation and an emphasis
            on generalizability.                               The secondary aims of this study are:
                                                               (i)  To implement and evaluate a validation strategy
              The subsequent sections provide a detailed account   including three-way data splitting and  k-fold cross-
            of the methodology, followed by a presentation of the   validation to ensure proper model training and
            results, and a discussion on the implications of the findings   assessment
            for the future of AI-assisted bone fracture detection in   (ii)  To quantify the model’s performance across internal
            clinical practice. This research represents an important   validation, test datasets, and external datasets using
            advancement in improving the accuracy and efficiency of   multiple  metrics  (accuracy,  precision,  recall,  and
            orthopedic imaging diagnostics, with the ultimate goal of   F1-score) to provide a complete picture of its clinical
            enhancing patient care and outcomes.                  utility
                                                               (iii) To assess the model’s generalizability by comparing
            1.2. Aims of the study                                its performance on an external dataset with its
            The primary aim of this study is to develop and validate   performance on internal datasets, thereby evaluating
            a CNN model for automated bone fracture detection in   its potential for real-world clinical application
            X-ray images that specifically addresses the challenges   (iv)  To  analyze error patterns  and  identify specific
            of overfitting and limited generalizability commonly   challenges in fracture detection that may impact the
            observed in medical imaging AI applications. Through   model’s performance when applied in diverse clinical
            methodological rigor in data handling and validation   settings.




            Volume 4 Issue 3 (2025)                         85                              doi: 10.36922/gtm.8526
   88   89   90   91   92   93   94   95   96   97   98