Page 95 - GTM-4-3
P. 95
Global Translational Medicine CNNs for overfitting and generalizability in fracture detection
generalization capacity. The Adam optimizer’s adaptive 11: Backpropagate gradients and update parameters using
learning rates mitigated gradient instability during early Adam
training phases, while the conservative initial learning 12: end for
rate ensured fine-grained parameter updates critical for 13: Evaluate model on D val
distinguishing subtle fracture phenotypes. Restricting 14: end for
training to 10 epochs prevented over-optimization to 15: Compute validation accuracy for fold i
transient batch-level noise, as evidenced by stabilized 16: end for
validation loss trajectories. By applying dropout exclusively 17: Compute cross-validation accuracy (Eq. 5)
during training, and disabling it during validation, the 18: Output: Trained model, cross-validation accuracy
metrics accurately reflected the model’s inherent diagnostic
capability rather than transient regularization effects. 2.6. Evaluation and testing
The final model was evaluated on the test dataset and an
2.4. k-fold cross-validation external dataset to assess generalizability. Performance
To assess the model’s generalizability, a k-fold cross- metrics, including accuracy, sensitivity, specificity, and
validation strategy was employed with k = 5. The dataset confusion matrices, were computed. The accuracy was
was split into k partitions, and the model was trained and calculated using a formula that accounts for the binary
validated on k−1 folds while testing on the others. The classification nature of the problem. Specifically, accuracy
process was repeated for all folds, and the cross-validation was defined as the ratio of correctly classified samples to
accuracy was computed as Equation IV: the total number of samples, incorporating true positives
(TPs), true negatives (TNs), false positives (FPs), and
1 k false negatives (FNs). This is expressed mathematically as
CV accuracy = ∑ accuracy (IV)
k i=1 i Equation V:
where accuracy is the validation accuracy for fold i. TP TN
i Accuracy (V)
The k-fold approach probed model stability under TP TN FP FN
variations in data composition. By cyclically excluding In this context, TP represents the number of fractured
distinct patient subgroups during training, the method cases correctly identified as fractured, while TN denotes
simulated multicenter validation scenarios and quantified the number of non-fractured cases correctly identified
performance variance attributable to sampling biases. as non-fractured. FP corresponds to non-fractured cases
Repeated retraining across folds ensured architectural incorrectly classified as fractured, and FN represents
decisions generalized beyond feature distributions in fractured cases incorrectly classified as non-fractured.
individual splits. This process mirrored clinical reality, This formulation provides a comprehensive measure of the
where AI tools must maintain diagnostic fidelity across model’s performance by considering all possible outcomes
heterogeneous patient populations and acquisition in the classification process.
protocols.
To further evaluate the model’s performance, additional
2.5. Algorithm pseudo-code metrics were computed. Precision, which measures the
proportion of correctly identified positive cases out of all
The training procedure, including cross-validation, is predicted positive cases, was calculated using Equation VI:
summarized in Algorithm 1.
Algorithm 1. Training and cross-validation procedure Precision TP (VI)
1: Input: Dataset D, number of folds k, number of epochs TP FP
E, mini-batch size B Recall, also referred to as sensitivity or the TP rate,
2: Split D into k folds was used to assess the model’s ability to identify all actual
3: for i = 1 to k do positive cases. It is defined as Equation VII:
4: Assign i-th fold as the validation set D , remaining
val
folds as training set D train TP
5: Initialize CNN model parameters Recall TP FN (VII)
6: for epoch = 1 to E do
7: Divide D train into mini-batches of size B To balance precision and recall, the F1-score was
8: for each mini-batch (X, y) do computed as the harmonic mean of these two metrics,
9: Perform forward pass to compute predictions ŷ providing a single measure that accounts for both FPs and
10: Compute loss L (Eq. 4) FNs. The F1-score is given by Equation VIII:
Volume 4 Issue 3 (2025) 87 doi: 10.36922/gtm.8526

