Page 80 - AIH-2-2
P. 80

Artificial Intelligence in Health                                 Efficient knowledge distillation for breast US



            clinical parameters related to volume and shape.  US image   the addition of regularization to the model, resulting in
                                                  1
            segmentation has been utilized in a variety of applications,   improved generalization. 17-19  Pruning usually eliminates
            including the creation of image atlases, determination of   “unimportant” weights  from a  deployed  model.  This
            the size or shape of the target ROI, target therapies, and   means that pruning is rarely useful for model efficiency
            image-guided procedures. These segmentation masks   in training and inference time; however, it can help with
            are manually delineated by an expert clinician and are   model storage. Similarly, quantization-based compression
            considered the gold standard in medical applications.   techniques help in model storage by reducing the
            Despite the importance of manual delineation, it is a time-  number of bits required to save the weights. 20-22  Pruning
            consuming and labor-intensive task that is frequently subject   and quantization-based techniques provide a suitable
            to inter- and intra-observer variability due to differences in   compression rate without sacrificing accuracy. They are,
            clinicians’ experience, attention, and visual fatigue, as well as   nevertheless, better suited to applications that demand
            insufficient training of clinicians.  Therefore, computerized   consistent model performance. As a result, KD-based
                                     2,3
            semi- and fully-automatic segmentation techniques based   techniques, also known as student-teacher networks, are
            on convolutional neural networks (CNN) have attracted   better suited to applications involving small-size datasets
            great interest in expediting the delineation procedure while   or requiring large efficiency improvements.  In these
                                                                                                    15
            improving the reproducibility of the delineations. 4-7  approaches, the model is directly accelerated without
              CNN-based algorithms have  made revolutionary    special hardware or implementations. To be more specific,
            developments in  CAD  systems.  However,  despite  the   the student network (i.e., small network) is trained
            current growth of CNN-based algorithms for segmentation   under the supervision of the teacher network (i.e., large
                                                                       16
            purposes, these techniques are rather complex and their   network).  The main idea of the KD-based approach is to
            ability to generate satisfactory results on specific medical   transfer information from a complex teacher network into a
            imaging problems is often limited.  Complexity in network   small student network by simulating the distribution of the
                                       1
            design and configuration does not necessarily lead to better   teacher network’s representation. Previous experimental
            performance.  Furthermore, networks with large amounts   results have demonstrated that the student network can
                       8
            of parameters that are both memory- and computationally-  match or even beat the teacher’s performance while being
            demanding are often a hindrance to modern CNN-based   computationally efficient. 16,23-25
            segmentation approaches. To be more specific, although an   While  previous methods tend to  capture  rich
            increase in size is usually correlated with an improvement   information from various levels of teacher representation,
            in representation power, it comes at a price: longer training   they  lack  emphasis  on  identifying  the  most  effective
            time and more memory usage. With the present expansion   representation level. Moreover, many existing techniques
            of point-of-care US (POCUS) imaging equipment, it   propose complex strategies that pose implementation
            is critical to build networks that are computational   challenges. To this end, we address a gap in current KD
            and  memory-efficient.  Compared  to  other  imaging   techniques by focusing on the selection of optimal teacher
            modalities, POCUS has the primary advantage of allowing   representations from different levels, which has been
            investigations at the bedside, which is especially appealing   overlooked in existing approaches. To be more specific,
            for acutely ill patients who cannot normally be transported   we study the impact of transferring knowledge from the
            for such testing.  One common use case of CAD-based   teacher’s output layer as well as from the intermediate
                         9,10
            systems equipped with CNN-based techniques is in   layers of the teacher. Moreover, many existing techniques
            POCUS imaging for breast cancer detection. 11-13  To achieve   introduce complexities in selecting the appropriate teacher
            computational and memory efficiency, researchers have   level for knowledge transfer.  In contrast, we conduct
                                                                                       26
            developed novel strategies for compressing large models so   an extensive analysis of KD pathways, loss functions,
            that the same or similar generalization performance can be   and the  impact of  augmentation,  providing  valuable
            achieved by training smaller networks. 14          insights into the mechanisms underlying knowledge
              In model compression techniques such as parameter   transfer from teacher to student networks. The proposed
            pruning,  quantization,  and  knowledge  distillation  (KD),   method simplifies the KD process by pinpointing the
            to name a few, the goal is to minimize the associated   most beneficial teacher representation level, thus offering
            computational and memory costs. In these techniques,   a more straightforward and practical solution for model
            the large model is encoded to a more efficient format   compression and performance enhancement. The main
            with minimal performance impact. 15,16  Parameter pruning   contributions are summarized below:
            involves training a large model and then removing   •   Highlighting the potential of leveraging teacher
            unnecessary weights and parameters to get a considerably   networks to facilitate significant performance gains in
            smaller  yet  effective model.  This method  also aids  in   student models, indicating effective knowledge transfer


            Volume 2 Issue 2 (2025)                         74                               doi: 10.36922/aih.3509
   75   76   77   78   79   80   81   82   83   84   85