Page 81 - AIH-2-2
P. 81

Artificial Intelligence in Health                                 Efficient knowledge distillation for breast US



            •   Exploring the fundamental role of teacher      lightweight deep learning models, using unlabeled data to
               augmentation techniques and loss functions in   achieve fast automated detection of abnormality in optical
               facilitating knowledge transfer across different   coherence tomography B-scans. Vaze  et al.  introduced a
                                                                                                 42
               distillation pathways                           methodology for modifying and compressing the original
            •   Developing a student network that achieves     U-Net model  while incorporating KD to ensure that the
                                                                         45
               comparable performance to the teacher network   performance of the compressed model closely matches that of
               while having significantly (100 times) fewer trainable   the original U-Net on 5635 US images. Cao et al.  proposed
                                                                                                     43
               parameters.                                     a noise filter network (NF-Net) that mitigates the negative
              The rest of the paper is structured as follows: section   impact of noisy labels through the incorporation of two
            2 provides an in-depth literature review, while Section 3   softmax layers for classification and a teacher-student module
            outlines our proposed methodology. Our achievements   for distilling the knowledge of clean labels in the classification
                                                                                  46
            and results are presented in Section 5, and concluding   of breast tumors. Fan et al.  introduces optimization trajectory
            remarks are presented in Section 6.                distillation, a novel approach using a dual-stream distillation
                                                               algorithm for unsupervised domain adaptation.
            2. Related works                                     Table 1 reviews the key features of the aforementioned

            In this section, we present an extensive review    studies. Since the generalizability of the works discussed
            encompassing previous methodologies for network    in section 2.1 remains untested in the medical image
            compression based on KD, alongside an analysis of US   domain, these studies are excluded from  Table 1. In
            segmentation techniques, specifically those employing   Table 1, most papers either utilize the output layer or the
            KD methodologies. Please note that the structure of this   intermediate layers for distillation, and none investigate
            section is designed to review KD studies utilizing both   both simultaneously. Transferring knowledge solely
            natural and medical image datasets. Additionally, since   from the  logits can  lead to a performance gap between
            we are using a publicly available dataset introduced by   teacher and student models. Each paper employs unique
            Yap et al.  (i.e., Dataset_A), we include a review of recent   distillation losses, yet none explores the impact of these
                   27
            studies  that  have  employed  this  dataset,  regardless  of   losses on the distillation process. By taking the L1-norm
            whether they used KD as their main methodology. Our   of all layers, knowledge transfer is ensured throughout the
            aim is to compare the results with those of other studies   entire network, promoting more comprehensive learning.
            that used the same dataset.
                                                               2.3. Studies on Dataset_A
            2.1. Studies on KD                                 In this paper, as we utilized a publicly available 2D US

            KD-based techniques have been used in both classification   dataset introduced by Yap et al.,  we present a review of
                                                                                         27
            and segmentation tasks. 24,28-33  The main idea of these   publications that have employed the same dataset to ensure
            approaches  is  to  distill  knowledge  from  the  output   a fair comparison of our segmentation results with existing
            probabilities with rich information of the teacher network   works. It is worth noting that we employ the Dataset_A as
            to the student network. Xu et al.  focused on matching the   explained in Yap et al.  and maintain consistency in our
                                     28
                                                                                 27
            distribution of logits while Zagoruyko and Komodakis
                                                         29
            transferred  knowledge  from  intermediate  features.  Tung   Table 1. Summary of previous works and their reported DSC
            and Mori  proposed the distillation of similarity-preserved   scores on Dataset_A
                   33
            knowledge such that the student network can preserve the
            pairwise similarities of paired inputs that provide similar   Article  Dataset  Task  Knowledge distillation
            activation maps from the teacher network. He  et al.                             method
                                                         31
            developed a KD method for semantic segmentation that   Owen et al. 41  Optical   Classification From model logits using
            minimizes the inconsistency between student and teacher       coherence          binary cross-entropy
                                                                          tomography
            knowledge.  Another  KD-based  strategy  on  semantic     42
            segmentation proposed by Liu et al.  performed pairwise   Vaze et al.  Nerve US  Segmentation From all the layers using
                                         32
                                                                                             L1-norm
            structure distillation and holistic distillation schemes.
                                                               Cao et al. 43  Breast US  Classification From model logits using
            2.2. Studies on KD in medical images                                             squared error
                                                               Fan et al. 46  Multiple a  Multi-task b  From gradients of one
            Recently, researchers have adopted KD-based techniques                           domain to another
            for various applications in  medical imaging, 34-40  and   a Multiple datasets were used. For more details, refer to. 46
            specifically in US imaging. 41-44  Owen  et al.  explored   b Multiple tasks, including segmentation, classification, etc.
                                                 41
            the efficacy of a student-teacher framework in training   Abbreviation: DSC: Dice similarity coefficient.
            Volume 2 Issue 2 (2025)                         75                               doi: 10.36922/aih.3509
   76   77   78   79   80   81   82   83   84   85   86