Page 85 - AIH-2-2
P. 85

Artificial Intelligence in Health                                 Efficient knowledge distillation for breast US



              The KLD loss quantifies the divergence between two   3.5.2. Strong augmentation
            probability distributions as defined in:           Our strong augmentation strategy employed a more
                   N        Pi                             aggressive set of techniques. Random cropping, shift, scale,

            KLD   P ilog   Qi                  (III)   rotation, Gaussian noise injection, and pixel dropout were
                  i1                                        all applied before normalization to the ImageNet space.
              Where P and Q refers to teacher and student
            representation distributions in our case.          3.5.3. Experimental overview
                                                               As we have highlighted, our research delves into several
            3.4.2. Supervised loss (L )
                                sup                            crucial factors within the realm of KD, including different
            For the supervised loss, we used cross-entropy (CE) loss.   distillation paths, loss functions, and augmentation
            CE loss measures the dissimilarity between the predicted   strategies. These experiments are summarized in Table 3,
            pixel-wise probability distribution and the ground truth   in which we employ distinct notations for each experiment
            labels. The formula for cross-entropy loss is given by:  to enhance clarity and comprehension.
                 1  N                                         4. Experiments
            CE     ylog p  ^ i                     (IV)
                 N  i1  i                                     In our experiments, we utilized an Nvidia Titan XP GPU
                                                               running on Ubuntu 20.04.6 LTS as our computational
            3.5. Augmentation
                                                               platform. We employed PyTorch as our primary framework
            In our experimental setup, we separated our approach into   for model implementation. For building our models, we
            two  distinct strategies  regarding  data augmentation.  In   relied on Segmentation Models,  a PyTorch-based library.
                                                                                        79
            one set of experiments, we employed weak augmentation   Additionally, we utilized Albumentations,  another
                                                                                                    80
            solely for the teacher model while implementing strong   PyTorch-based library, for implementing augmentation
            augmentation exclusively for the student. In another   techniques.
            series of experiments, we applied strong augmentation
            to both teacher and student models. This difference in   Table 3. Summary of experimental factors in the proposed
            augmentation strategies was purposefully designed to   knowledge distillation approach
            explore and assess the impact of differential augmentation
            levels on the performance and robustness of the resulting   KD   Model      KD loss   Teacher
                                                               Representation
                                                                            notations
                                                                                                  augmentation
            models. By varying the augmentation schemes, we aimed
            to gain insights into how each model responds to different   KD (Logits)  L_MSE  Mean squared error Strong
            levels of data perturbation and how this influences their       L_MSE_  Mean squared error Weak
            learning dynamics and generalization abilities.                 WAug
                                                                            L_KLD   Kullback–Leibler   Strong
              By isolating weak augmentation for the teacher                        divergence
            and strong augmentation for the student, we sought to           L_KLD_  Kullback–Leibler   Weak
            emphasize the role of the teacher as a stable source of         WAug    divergence
            distilled knowledge, while allowing the student to leverage   KD (Hidden)  L_MSE  Mean squared error Strong
            augmented data for enhanced generalization. Conversely,         L_MSE_  Mean squared error Weak
            employing strong augmentation for both models aimed to          WAug
            evaluate the effectiveness of augmenting data at both stages    L_KLD   Kullback–Leibler   Strong
            of the distillation process, potentially leading to further             divergence
            improvements in model performance through increased             L_KLD_  Kullback–Leibler   Weak
            exposure to diverse training instances.                         WAug    divergence
            3.5.1. Weak augmentation                           KD (Hidden-   L_MSE  Mean squared error Strong
                                                               Regressor)
            Our weak augmentation strategy employed a conservative          L_MSE_  Mean squared error Weak
                                                                            WAug
            approach tailored to the teacher model’s training. Only         L_KLD   Kullback–Leibler   Strong
            random cropping was applied before normalization to the                 divergence
            ImageNet mean and standard deviation. Since we utilized         L_KLD_  Kullback–Leibler   Weak
            pre-trained networks as our base architecture, adhering to      WAug    divergence
            these standard normalization procedures helped maintain   Abbreviations: KD: Knowledge distillation; KLD: Kullback-Leibler
            consistency with the pre-existing feature representations.  divergence; MSE: Mean squared error.


            Volume 2 Issue 2 (2025)                         79                               doi: 10.36922/aih.3509
   80   81   82   83   84   85   86   87   88   89   90