Page 87 - AIH-2-2
P. 87

Artificial Intelligence in Health                                 Efficient knowledge distillation for breast US




                         A                B             C             D             E








                         F               G              H             I              J









                         K                L             M             N             O








            Figure 2. Visual comparison of our ablation study. The original test image, the prediction of the teacher model, and the prediction of the unsupervised
            student model are shown in (A-C), respectively. The predicted segmentations of the proposed KD-based models are shown in (D-O). Green contours
            represent the ground truth mask, while the red contours illustrate the corresponding predictions.
            involves exchanging knowledge between hidden features,   could notably influence the effectiveness of knowledge
            the top DSC of 79.00 was achieved by the H_KLD model.   transfer in KD.
            Similarly, in KD (Hidden-Regressor), where knowledge
            passes from hidden features through a regressor model,   5.1.3. Effect of augmentation
            the highest DSC of 79.00 was reached by the HReg_KLD   Exploring the impact of weak augmentation on the teacher
            model. These findings collectively suggest that all proposed   model reveals more insights into the KD process. The
            KD paths exhibit comparable performance, enhancing   utilization of weak augmentation for the teacher did not
            student performance by approximately 9%. Such consistent   yield a significant impact on performance. Models with and
            enhancements underscore the robustness and versatility of   without weak augmentation for the teacher demonstrated
            the proposed KD paths, demonstrating their effectiveness   comparable performance. Despite the negligible effect
            in knowledge exchange between teacher and student.  of weak augmentation on the teacher model, all models
                                                               incorporating teacher guidance showcased improvements
            5.1.2. Effect of KD loss function                  compared to students without such supervision. This
            Further  analysis  of  the  results  presented  in  Table  4   observation demonstrates the fundamental role of the
            shows that both the MSE  and KLD loss  functions  are   teacher network in guiding and enhancing the learning
            effective for  knowledge transfer.  Notably,  across  various   process of the student network. While weak augmentation
            KD pathways, including KD (Logits), KD (Hidden), and   may not directly influence the performance of the teacher
            KD (Hidden-Regressor), the DSC reveals a consistent   model, its presence facilitates the extraction and transfer
            pattern wherein both loss functions demonstrate similar   of valuable knowledge, thereby contributing to the overall
            effectiveness. In the KD (Logits) pathway, for instance, the   improvement in student performance.
            DSC achieved by L_MSE and L_KLD, namely 77.75 and
            78.00, respectively, highlight the marginal outperformance   5.2. Results with respect to SOTA methods
            of  L_KLD.  Similarly,  in  other  KD  pathways  such  as  KD   In this section, we compare our best model with SOTA
            (Hidden)  and  KD  (Hidden-Regressor),  the  comparative   models, as outlined in  Table 5, which have utilized the
            analysis reveals a similar pattern between MSE and KLD.   same dataset employed in our study. It is important to
            This slight outperformance of KLD in average DSC scores   emphasize that none of these SOTA models have provided
            suggests that KLD can be a preferable choice, indicating   access to either their codebase or their training and testing
            that the selection between MSE and KLD loss functions   splits. Consequently, our comparison is based solely on


            Volume 2 Issue 2 (2025)                         81                               doi: 10.36922/aih.3509
   82   83   84   85   86   87   88   89   90   91   92