Page 92 - GTM-4-3
P. 92

Global Translational Medicine                          CNNs for overfitting and generalizability in fracture detection



              The application of AI in radiology has seen remarkable   Proper  data  splitting is  essential  to developing
            progress, with AI-based tools being used to enhance the   models that generalize well to unseen data. This involves
            accuracy and efficiency of diagnosing bone fractures.    partitioning datasets into different subsets, such as
                                                          1
            These tools are designed to assist radiologists by providing   training, validation, and test sets. 14-19  The graph in Figure 1
            faster and more consistent fracture identification, which is   illustrates the number of studies published each year
            crucial for timely and effective treatment.  Recent studies   using  two-way  and three-way data  splitting  strategies
                                             1,2
            have demonstrated the capability of AI algorithms to   from 2007 to 2022. It highlights a significant shift in the
            accurately detect and classify fractures, especially in the   research community’s approach to data splitting in ML
            wrist and long bones, using X-ray images. 3        studies. In the earlier years, particularly from 2007 to
              CNNs have emerged as a cornerstone in medical    2017, most studies employed two-way splitting, where the
            imaging analysis, particularly in orthopedics, due to their   dataset is divided into a training set and a testing set. This
            ability to process and analyze complex image data with   method lacks a validation set, which is essential for tuning
                        4
            high accuracy.  These networks are structured to mimic   hyperparameters and preventing overfitting. Without a
            the human visual cortex,  allowing them to identify   validation set, models may not generalize well to unseen
                                  5
            patterns and features in medical images that may be   data, leading to inefficient ML training and distorted
            difficult for human observers to discern.  In the context   results.  This  limits  the model’s ability to  generalize  to
                                             6,7
            of bone fracture detection, CNNs have shown promising   new, unseen data. Particular attention must be given to
            results, with some studies indicating that AI is noninferior   avoid data leakage, where information from the test set
            to clinicians in terms of diagnostic performance. 8  inadvertently influences model training, leading to inflated
                                                               and unreliable performance metrics.
            1.1. Common challenges                               Starting around 2018, the graph shows several studies
            Despite the advancements in AI-based fracture detection,   adopting three-way splitting. This approach involves
                                              9
            several challenges persist in the field.  High-quality,   splitting the data into three sets: training, validation,
            annotated datasets are essential for training effective AI   and testing. The validation set is used during model
            models. However, there is often a scarcity of such datasets,   development to fine-tune hyperparameters and select the
            which can limit the performance and generalizability   best model before final evaluation on the test set. By 2022,
            of fracture detection models. AI models, particularly   the number of studies using three-way splitting surpasses
            deep learning models, often overfit to the training data,   those using two-way splitting, indicating a positive trend
            especially when the dataset is small or lacks diversity. 10,11    toward more robust ML practices.
            This limits the model’s ability to generalize to new, unseen   The increasing adoption of three-way splitting reflects
            data.                                              a growing awareness of the pitfalls of overfitting and the
              A persistent challenge in AI-assisted fracture   importance of model validation. Without a validation set,
            detection lies not in achieving high nominal accuracy   there is a risk of inadvertently tuning the model to perform
            but in ensuring that such metrics stem from rigorously   well  on  the test set,  which can  lead  to  overly  optimistic
            validated models capable of real-world generalization. 1,12,13    performance estimates and poor generalization. 17-20  When
            Many studies report exceptional performance, yet
            methodological shortcomings, such as inadequate data
            splitting, insufficient validation protocols (systematic
            procedures for evaluating model performance, including
            partitioning data into training, validation, and test sets to
            prevent overfitting), or reliance on homogeneous datasets,
            often inflate internal benchmarks at the expense of clinical
            applicability. 14-17  This discrepancy highlights a critical
            disconnect; models optimized for accuracy on internal data
            may  fail catastrophically  when  confronted  with  external
            populations or operational heterogeneity, a limitation
            amplified by inconsistent validation practices across the   Figure  1.  Yearly trend in the number of studies employing two-way
            field.  The goal of our study is to directly address this gap   and three-way data splitting strategies in artificial intelligence-assisted
                9,18
            by prioritizing training stability and generalizability over   bone fracture detection research (2007–2022). The graph highlights the
                                                               increasing adoption of three-way splitting, reflecting improved validation
            raw  performance  through  methodological  rigor  in data   practices and model generalizability in machine learning. Data derived
            handling and model validation.                     from Jung et al. 9


            Volume 4 Issue 3 (2025)                         84                              doi: 10.36922/gtm.8526
   87   88   89   90   91   92   93   94   95   96   97