Page 85 - AIH-2-2

P. 85

Artificial Intelligence in Health Efficient knowledge distillation for breast US

The KLD loss quantifies the divergence between two 3.5.2. Strong augmentation
probability distributions as defined in: Our strong augmentation strategy employed a more
N Pi aggressive set of techniques. Random cropping, shift, scale,

KLD P ilog Qi (III) rotation, Gaussian noise injection, and pixel dropout were
i1 all applied before normalization to the ImageNet space.
Where P and Q refers to teacher and student
representation distributions in our case. 3.5.3. Experimental overview
As we have highlighted, our research delves into several
3.4.2. Supervised loss (L )
sup crucial factors within the realm of KD, including different
For the supervised loss, we used cross-entropy (CE) loss. distillation paths, loss functions, and augmentation
CE loss measures the dissimilarity between the predicted strategies. These experiments are summarized in Table 3,
pixel-wise probability distribution and the ground truth in which we employ distinct notations for each experiment
labels. The formula for cross-entropy loss is given by: to enhance clarity and comprehension.
1 N 4. Experiments
CE ylog p ^ i (IV)
N i1 i In our experiments, we utilized an Nvidia Titan XP GPU
running on Ubuntu 20.04.6 LTS as our computational
3.5. Augmentation
platform. We employed PyTorch as our primary framework
In our experimental setup, we separated our approach into for model implementation. For building our models, we
two distinct strategies regarding data augmentation. In relied on Segmentation Models, a PyTorch-based library.
79
one set of experiments, we employed weak augmentation Additionally, we utilized Albumentations, another
80
solely for the teacher model while implementing strong PyTorch-based library, for implementing augmentation
augmentation exclusively for the student. In another techniques.
series of experiments, we applied strong augmentation
to both teacher and student models. This difference in Table 3. Summary of experimental factors in the proposed
augmentation strategies was purposefully designed to knowledge distillation approach
explore and assess the impact of differential augmentation
levels on the performance and robustness of the resulting KD Model KD loss Teacher
Representation
notations
augmentation
models. By varying the augmentation schemes, we aimed
to gain insights into how each model responds to different KD (Logits) L_MSE Mean squared error Strong
levels of data perturbation and how this influences their L_MSE_ Mean squared error Weak
learning dynamics and generalization abilities. WAug
L_KLD Kullback–Leibler Strong
By isolating weak augmentation for the teacher divergence
and strong augmentation for the student, we sought to L_KLD_ Kullback–Leibler Weak
emphasize the role of the teacher as a stable source of WAug divergence
distilled knowledge, while allowing the student to leverage KD (Hidden) L_MSE Mean squared error Strong
augmented data for enhanced generalization. Conversely, L_MSE_ Mean squared error Weak
employing strong augmentation for both models aimed to WAug
evaluate the effectiveness of augmenting data at both stages L_KLD Kullback–Leibler Strong
of the distillation process, potentially leading to further divergence
improvements in model performance through increased L_KLD_ Kullback–Leibler Weak
exposure to diverse training instances. WAug divergence
3.5.1. Weak augmentation KD (Hidden- L_MSE Mean squared error Strong
Regressor)
Our weak augmentation strategy employed a conservative L_MSE_ Mean squared error Weak
WAug
approach tailored to the teacher model’s training. Only L_KLD Kullback–Leibler Strong
random cropping was applied before normalization to the divergence
ImageNet mean and standard deviation. Since we utilized L_KLD_ Kullback–Leibler Weak
pre-trained networks as our base architecture, adhering to WAug divergence
these standard normalization procedures helped maintain Abbreviations: KD: Knowledge distillation; KLD: Kullback-Leibler
consistency with the pre-existing feature representations. divergence; MSE: Mean squared error.

Volume 2 Issue 2 (2025) 79 doi: 10.36922/aih.3509

80 81 82 83 84 85 86 87 88 89 90