Page 80 - AIH-2-2
P. 80
Artificial Intelligence in Health Efficient knowledge distillation for breast US
clinical parameters related to volume and shape. US image the addition of regularization to the model, resulting in
1
segmentation has been utilized in a variety of applications, improved generalization. 17-19 Pruning usually eliminates
including the creation of image atlases, determination of “unimportant” weights from a deployed model. This
the size or shape of the target ROI, target therapies, and means that pruning is rarely useful for model efficiency
image-guided procedures. These segmentation masks in training and inference time; however, it can help with
are manually delineated by an expert clinician and are model storage. Similarly, quantization-based compression
considered the gold standard in medical applications. techniques help in model storage by reducing the
Despite the importance of manual delineation, it is a time- number of bits required to save the weights. 20-22 Pruning
consuming and labor-intensive task that is frequently subject and quantization-based techniques provide a suitable
to inter- and intra-observer variability due to differences in compression rate without sacrificing accuracy. They are,
clinicians’ experience, attention, and visual fatigue, as well as nevertheless, better suited to applications that demand
insufficient training of clinicians. Therefore, computerized consistent model performance. As a result, KD-based
2,3
semi- and fully-automatic segmentation techniques based techniques, also known as student-teacher networks, are
on convolutional neural networks (CNN) have attracted better suited to applications involving small-size datasets
great interest in expediting the delineation procedure while or requiring large efficiency improvements. In these
15
improving the reproducibility of the delineations. 4-7 approaches, the model is directly accelerated without
CNN-based algorithms have made revolutionary special hardware or implementations. To be more specific,
developments in CAD systems. However, despite the the student network (i.e., small network) is trained
current growth of CNN-based algorithms for segmentation under the supervision of the teacher network (i.e., large
16
purposes, these techniques are rather complex and their network). The main idea of the KD-based approach is to
ability to generate satisfactory results on specific medical transfer information from a complex teacher network into a
imaging problems is often limited. Complexity in network small student network by simulating the distribution of the
1
design and configuration does not necessarily lead to better teacher network’s representation. Previous experimental
performance. Furthermore, networks with large amounts results have demonstrated that the student network can
8
of parameters that are both memory- and computationally- match or even beat the teacher’s performance while being
demanding are often a hindrance to modern CNN-based computationally efficient. 16,23-25
segmentation approaches. To be more specific, although an While previous methods tend to capture rich
increase in size is usually correlated with an improvement information from various levels of teacher representation,
in representation power, it comes at a price: longer training they lack emphasis on identifying the most effective
time and more memory usage. With the present expansion representation level. Moreover, many existing techniques
of point-of-care US (POCUS) imaging equipment, it propose complex strategies that pose implementation
is critical to build networks that are computational challenges. To this end, we address a gap in current KD
and memory-efficient. Compared to other imaging techniques by focusing on the selection of optimal teacher
modalities, POCUS has the primary advantage of allowing representations from different levels, which has been
investigations at the bedside, which is especially appealing overlooked in existing approaches. To be more specific,
for acutely ill patients who cannot normally be transported we study the impact of transferring knowledge from the
for such testing. One common use case of CAD-based teacher’s output layer as well as from the intermediate
9,10
systems equipped with CNN-based techniques is in layers of the teacher. Moreover, many existing techniques
POCUS imaging for breast cancer detection. 11-13 To achieve introduce complexities in selecting the appropriate teacher
computational and memory efficiency, researchers have level for knowledge transfer. In contrast, we conduct
26
developed novel strategies for compressing large models so an extensive analysis of KD pathways, loss functions,
that the same or similar generalization performance can be and the impact of augmentation, providing valuable
achieved by training smaller networks. 14 insights into the mechanisms underlying knowledge
In model compression techniques such as parameter transfer from teacher to student networks. The proposed
pruning, quantization, and knowledge distillation (KD), method simplifies the KD process by pinpointing the
to name a few, the goal is to minimize the associated most beneficial teacher representation level, thus offering
computational and memory costs. In these techniques, a more straightforward and practical solution for model
the large model is encoded to a more efficient format compression and performance enhancement. The main
with minimal performance impact. 15,16 Parameter pruning contributions are summarized below:
involves training a large model and then removing • Highlighting the potential of leveraging teacher
unnecessary weights and parameters to get a considerably networks to facilitate significant performance gains in
smaller yet effective model. This method also aids in student models, indicating effective knowledge transfer
Volume 2 Issue 2 (2025) 74 doi: 10.36922/aih.3509

