Page 88 - AIH-2-2
P. 88

Artificial Intelligence in Health                                 Efficient knowledge distillation for breast US




            Table 5. Results of SOTA methods                   effectively learn from the expertise encoded in the teacher’s
                                                               parameters, leading to significant performance gains. Such
            Article          DSC (%)  No. of parameters (millions)  endeavors hold promise for advancing the state-of-the-art
            Liang et al. 52     84             20.5            in model compression and facilitating the deployment of
            Gao et al. 63       85             2.34            efficient deep learning solutions across various domains
            Lou et al. 69       90            26.63            and applications.
            Lee et al. 55       89             7.7               Even though our study provides valuable insights,
            Ours (L_KLD_WAug)   80             0.82            it would be advantageous to explore various student
            Abbreviation: DSC: Dice similarity score.          models with differing numbers of trainable parameters
                                                               to assess the trend of their performance relative to
            the results reported in their respective papers.  Table 5   parameter count.  This  investigation would  offer a
            exclusively showcases our best model alongside the top 3   deeper understanding of the scalability and efficiency
            SOTA models that have reported the number of trainable   of the proposed KD-based framework. Furthermore,
            parameters in their corresponding paper.           expanding our research to encompass additional
                                                               publicly available US datasets with diverse applications
              Our proposed best model demonstrates comparable   would improve the generalizability and robustness of the
            performance to SOTA models, despite having significantly   proposed framework.
            fewer trainable parameters. This observation highlights
            the efficiency of our model architecture in achieving   7. Conclusion
            competitive results while keeping the number of
            parameters minimal. By leveraging innovative design in   This study concludes by demonstrating how KD can
            distilling knowledge, our model strikes a balance between   improve the performance of lightweight student models for
            computational complexity and performance, making it   US breast tumor segmentation. The suggested framework
            well-suited for resource-constrained environments or   shows  promise for  resource-constrained applications
            applications where model size is a critical consideration.  such  as POCUS by achieving  competitive performance
                                                               with substantially fewer parameters through a systematic
            6. Discussion                                      analysis of KD routes, loss functions, and augmentation
                                                               impacts. The results highlight how crucial it is to choose the
            In  this  study, we  investigated  various  aspects  of  KD   best teacher representations and use teacher guidance to
            techniques and their implications for enhancing student   promote efficient knowledge transfer. The generalizability
            performance. Through an extensive analysis of KD   and resilience of this method would be further validated
            pathways, loss functions, and the impact of augmentation,   by future research examining scalability with different
            we gained valuable insights into the mechanisms    student model sizes and adding more US datasets, opening
            underlying knowledge transfer from teacher to student   the door for more effective and broadly applicable deep
            networks. Our findings revealed that the proposed KD   learning solutions in medical imaging.
            paths consistently demonstrated performance closely
            aligned with that of the teacher model, indicating effective   Acknowledgement
            knowledge transfer. Additionally, the comparative analysis
            between MSE and KLD loss functions showed comparable   None.
            efficacy in facilitating knowledge transfer across different   Funding
            KD pathways. Furthermore, exploring the impact of
            different augmentations on the teacher model showed the   This work was supported by The Natural Sciences and
            fundamental role of teacher guidance in improving student   Engineering  Research  Council  of  Canada  (NSERC)
            performance, despite the negligible effect of augmentation   (NSERC RGPIN-2020-04612).
            on the teacher model itself.
                                                               Conflict of interest
              Finally, our comparison with SOTA models showcased
            the efficiency of our proposed model architecture. Despite   The authors declare that they have no competing interests.
            having significantly fewer trainable parameters, our best   Author contributions
            model demonstrated comparable performance to SOTA
            models, highlighting the effectiveness of our approach in   Conceptualization: All authors
            achieving competitive results while minimizing model   Formal analysis: All authors
            complexity. Therefore, by leveraging the rich knowledge   Investigation: Bahareh Behboodi
            encapsulated within the teacher network, students can   Methodology: All authors


            Volume 2 Issue 2 (2025)                         82                               doi: 10.36922/aih.3509
   83   84   85   86   87   88   89   90   91   92   93