Page 81 - AIH-2-2
P. 81
Artificial Intelligence in Health Efficient knowledge distillation for breast US
• Exploring the fundamental role of teacher lightweight deep learning models, using unlabeled data to
augmentation techniques and loss functions in achieve fast automated detection of abnormality in optical
facilitating knowledge transfer across different coherence tomography B-scans. Vaze et al. introduced a
42
distillation pathways methodology for modifying and compressing the original
• Developing a student network that achieves U-Net model while incorporating KD to ensure that the
45
comparable performance to the teacher network performance of the compressed model closely matches that of
while having significantly (100 times) fewer trainable the original U-Net on 5635 US images. Cao et al. proposed
43
parameters. a noise filter network (NF-Net) that mitigates the negative
The rest of the paper is structured as follows: section impact of noisy labels through the incorporation of two
2 provides an in-depth literature review, while Section 3 softmax layers for classification and a teacher-student module
outlines our proposed methodology. Our achievements for distilling the knowledge of clean labels in the classification
46
and results are presented in Section 5, and concluding of breast tumors. Fan et al. introduces optimization trajectory
remarks are presented in Section 6. distillation, a novel approach using a dual-stream distillation
algorithm for unsupervised domain adaptation.
2. Related works Table 1 reviews the key features of the aforementioned
In this section, we present an extensive review studies. Since the generalizability of the works discussed
encompassing previous methodologies for network in section 2.1 remains untested in the medical image
compression based on KD, alongside an analysis of US domain, these studies are excluded from Table 1. In
segmentation techniques, specifically those employing Table 1, most papers either utilize the output layer or the
KD methodologies. Please note that the structure of this intermediate layers for distillation, and none investigate
section is designed to review KD studies utilizing both both simultaneously. Transferring knowledge solely
natural and medical image datasets. Additionally, since from the logits can lead to a performance gap between
we are using a publicly available dataset introduced by teacher and student models. Each paper employs unique
Yap et al. (i.e., Dataset_A), we include a review of recent distillation losses, yet none explores the impact of these
27
studies that have employed this dataset, regardless of losses on the distillation process. By taking the L1-norm
whether they used KD as their main methodology. Our of all layers, knowledge transfer is ensured throughout the
aim is to compare the results with those of other studies entire network, promoting more comprehensive learning.
that used the same dataset.
2.3. Studies on Dataset_A
2.1. Studies on KD In this paper, as we utilized a publicly available 2D US
KD-based techniques have been used in both classification dataset introduced by Yap et al., we present a review of
27
and segmentation tasks. 24,28-33 The main idea of these publications that have employed the same dataset to ensure
approaches is to distill knowledge from the output a fair comparison of our segmentation results with existing
probabilities with rich information of the teacher network works. It is worth noting that we employ the Dataset_A as
to the student network. Xu et al. focused on matching the explained in Yap et al. and maintain consistency in our
28
27
distribution of logits while Zagoruyko and Komodakis
29
transferred knowledge from intermediate features. Tung Table 1. Summary of previous works and their reported DSC
and Mori proposed the distillation of similarity-preserved scores on Dataset_A
33
knowledge such that the student network can preserve the
pairwise similarities of paired inputs that provide similar Article Dataset Task Knowledge distillation
activation maps from the teacher network. He et al. method
31
developed a KD method for semantic segmentation that Owen et al. 41 Optical Classification From model logits using
minimizes the inconsistency between student and teacher coherence binary cross-entropy
tomography
knowledge. Another KD-based strategy on semantic 42
segmentation proposed by Liu et al. performed pairwise Vaze et al. Nerve US Segmentation From all the layers using
32
L1-norm
structure distillation and holistic distillation schemes.
Cao et al. 43 Breast US Classification From model logits using
2.2. Studies on KD in medical images squared error
Fan et al. 46 Multiple a Multi-task b From gradients of one
Recently, researchers have adopted KD-based techniques domain to another
for various applications in medical imaging, 34-40 and a Multiple datasets were used. For more details, refer to. 46
specifically in US imaging. 41-44 Owen et al. explored b Multiple tasks, including segmentation, classification, etc.
41
the efficacy of a student-teacher framework in training Abbreviation: DSC: Dice similarity coefficient.
Volume 2 Issue 2 (2025) 75 doi: 10.36922/aih.3509

