Page 93 - GTM-4-3
P. 93
Global Translational Medicine CNNs for overfitting and generalizability in fracture detection
studies do not demonstrate that their models are properly Table 1. Summary of challenges in automated bone fracture
converged and well validated, it becomes difficult to trust detection and the corresponding solutions implemented in
their reported metrics. This is because overfitting may this study
occur, where the model fits too closely to the training data Challenge How it is addressed
and fails to generalize to new, unseen data. Without robust Limited annotated Curated a comprehensive dataset of 4,900 X-ray
validation practices, such as using a separate validation set data in the field images, carefully preprocessed and standardized
to monitor model performance during training, overfitting to ensure high-quality input for the model.
can remain undetected. Consequently, the reported high Overfitting and Developed an advanced CNN architecture with
accuracy may not reflect the model’s true performance suboptimal CNN convolutional layers for feature extraction, batch
in real-world applications, undermining the reliability of architecture design normalization, rectified linear unit activations,
the study’s findings. 19-21 Therefore, it is essential for studies and dropout layers to mitigate overfitting.
to adopt proper validation strategies to ensure that their Skewed dataset splits Employed k-fold cross-validation to evaluate
models generalize well and that their reported metrics are affecting reliability model performance across multiple data splits,
trustworthy. ensuring reliable and robust performance
metrics.
Furthermore, many AI models, especially deep learning Lack of Performed external validation on an
models, are often considered “black boxes,” making it generalizability to independent dataset, confirming the model’s
difficult to interpret their decision-making process. This unseen data high accuracy and applicability in diverse
lack of transparency can hinder clinical adoption. 22-24 clinical settings.
Integrating AI tools into existing clinical workflows can Ensuring high Achieved high validation accuracy and test
be challenging, requiring changes in how radiologists and performance accuracy, confirmed with testing on external
data, along with high sensitivity and specificity,
other healthcare professionals operate. There is a need for showcasing strong reliability for clinical use.
standardized performance metrics to evaluate and compare Integration into Highlighted the importance of clinical
different AI models effectively. Current studies often use clinical workflows integration, with future work focusing on
varied metrics, making it difficult to assess their relative assessing the system’s impact on diagnostic
performance. 13,25 Table 1 summarizes the key challenges in accuracy and efficiency in real-world settings.
automated bone fracture detection and the corresponding Note: The table highlights the limitations addressed, including dataset
solutions implemented in this study to address these issues. preparation, overfitting, generalizability, and methods to ensure high
performance and clinical integration.
By addressing critical aspects of automated bone Abbreviation: CNN: Convoluted neural network.
fracture detection, this study provides insights into the
development of reliable AI systems for medical imaging. protocols, this study seeks to create a robust model that
It introduces an ML model that demonstrates strong maintains high diagnostic performance when applied to
performance and highlights its potential for clinical diverse clinical scenarios and external datasets.
application through rigorous validation and an emphasis
on generalizability. The secondary aims of this study are:
(i) To implement and evaluate a validation strategy
The subsequent sections provide a detailed account including three-way data splitting and k-fold cross-
of the methodology, followed by a presentation of the validation to ensure proper model training and
results, and a discussion on the implications of the findings assessment
for the future of AI-assisted bone fracture detection in (ii) To quantify the model’s performance across internal
clinical practice. This research represents an important validation, test datasets, and external datasets using
advancement in improving the accuracy and efficiency of multiple metrics (accuracy, precision, recall, and
orthopedic imaging diagnostics, with the ultimate goal of F1-score) to provide a complete picture of its clinical
enhancing patient care and outcomes. utility
(iii) To assess the model’s generalizability by comparing
1.2. Aims of the study its performance on an external dataset with its
The primary aim of this study is to develop and validate performance on internal datasets, thereby evaluating
a CNN model for automated bone fracture detection in its potential for real-world clinical application
X-ray images that specifically addresses the challenges (iv) To analyze error patterns and identify specific
of overfitting and limited generalizability commonly challenges in fracture detection that may impact the
observed in medical imaging AI applications. Through model’s performance when applied in diverse clinical
methodological rigor in data handling and validation settings.
Volume 4 Issue 3 (2025) 85 doi: 10.36922/gtm.8526

