Page 122 - AIH-2-4
P. 122

Artificial Intelligence in Health                                 RefSAM3D for medical image segmentation



            complex  tasks.  The  segment-everything-everywhere   addressed the critical need to account for 3D volumetric
            model  enhances  VFM  capabilities  by  introducing  a   or temporal information, which is vital for medical image
            universal prompting scheme that enables semantic-aware   segmentation. Innovations such as 3DSAM-Adapter  and
                                                                                                         13
            open-set segmentation, expanding their use in real-world   modality-agnostic SAM (MA-SAM)  have incorporated
                                                                                            34
            scenarios.  SegGPT,  in  turn,  standardizes  segmentation   3D convolutional adapters to transform SAM’s 2D
            data  and  employs  in-context  learning  for  both  images   architecture into one capable of recognizing 3D structures.
            and videos, allowing it to handle diverse segmentation   Similarly, SAMMed3D  introduced a framework to
                                                                                  11
            tasks without requiring additional task-specific training.   generate 3D prompts from 2D points, helping SAM process
            Complementing these advances, DINOv2  scales up    volumetric data more effectively. The success of these 3D
                                                25
            Vision Transformer (ViT) pre-training by increasing data   adaptations highlights the importance of leveraging spatial
            and model size, producing more general and transferable   information for more accurate segmentation. Recent trends
            visual features that simplify fine-tuning across a wide   indicate a shift toward prompt-free or semiautomatic
            range of tasks, further broadening VFM applicability. The   systems, like AutoSAM Adapter,  which aim to maintain
                                                                                         15
            SAM  is one of the most notable VFMs for general-purpose   SAM’s  zero-shot  capabilities  while  minimizing  manual
                4
            image segmentation. Pre-trained on 11 million images and   prompt generation.
            1 billion masks, SAM enables interactive, prompt-driven
            zero-shot segmentation across a wide variety of visual   2.3. PETL
            tasks. Its impressive versatility has made it a key model   With the widespread adoption of foundational models,
            for applications such as image segmentation, inpainting,   PETL has garnered significant attention. PETL methods
            and tracking. However, it still faces limitations in specific   can be categorized into three main groups. One approach
            domains such as medical imaging, camouflage detection,   is addition-based methods, which involve integrating
            and shadow segmentation. 26                        lightweight adapters or prompts into the original model.
                                                               These adapters or prompts allow the fine-tuning of only a
            2.2. Adaptation of the SAM in medical imaging      small number of additional parameters, enabling the model
            The adaptation of SAM for medical imaging has evolved   to adapt to specific tasks while preserving the majority
            rapidly, driven by its impressive zero-shot performance   of its pre-trained weights. This approach minimizes the
            in  natural  image  segmentation.  Initial  evaluation   computational overhead associated with training large
            studies 27-30  examined SAM’s applicability to medical image   models, as only the newly introduced components require
            segmentation, but its performance often fell short due   optimization.  Another strategy focuses on specification-
                                                                          9,35
            to the domain gap between natural and medical images.   based methods, which prioritize the identification and
                             28
            For instance, He et al.  noted a performance gap of up to   tuning of a small proportion of influential parameters
            70% in Dice scores compared to domain-specific models.   from the original model. This method often employs
            This highlighted the need for task-specific fine-tuning.   techniques such as sensitivity analysis to determine
            Following this, research attention shifted from evaluation   which parameters have the most significant impact on
            to the adaptation of SAM for medical images. 12,13,15,17    the model’s performance for a given task. By selectively
            Several  studies  have  experimented  with  fine-tuning   updating these parameters, specification-based methods
            SAM by modifying its prompt design to handle the   aim to achieve efficient adaptation while reducing training
            specific  characteristics  of  medical data.  SAM-Med2D,    burden and maintaining high performance levels. 10,13  In
                                                         31
            for example, leveraged more comprehensive prompts,   addition, reparameterization-based methods leverage low-
            including points, bounding boxes, and masks, to optimize   rank representations to minimize the number of trainable
            SAM for 2D medical image segmentation, whereas the   parameters during the fine-tuning process. Techniques
            medical  SAM  adapter   incorporated  point  prompts  and   such as Low-Rank Adaptation and factorized tuning
                              12
            adapters to inject medical domain knowledge into SAM’s   allow models to maintain their expressive power while
            architecture. Although these approaches enhanced SAM’s   significantly reducing the number of parameters that need
            performance, the creation of prompts for each 2D slice   to be adjusted. This approach not only enhances efficiency
            of 3D medical data proved to be labor-intensive. Efforts   but also enables strong performance across various PETL
            to adapt SAM for 3D medical image segmentation have   tasks, as it effectively captures the essential features
            focused on overcoming this limitation. MedLSAM  and   required for adaptation.  Recently, PETL techniques have
                                                                                  7
                                                     32
            SAM3D  applied SAM to 3D datasets, with approaches   been successfully utilized to adapt VFMs for a wide range
                  33
            such as SAMed  and Med-Tuning  employing techniques   of downstream tasks, including image classification, object
                        10
                                       9
            such as Low-Rank Adaptation to fine-tune SAM for 3D   detection, and, notably, medical image segmentation.
            tasks. However, most of these methods have not fully   Researchers have explored ways to fine-tune vision models
            Volume 2 Issue 4 (2025)                        116                          doi: 10.36922/AIH025080010
   117   118   119   120   121   122   123   124   125   126   127