Page 122 - AIH-2-4
P. 122
Artificial Intelligence in Health RefSAM3D for medical image segmentation
complex tasks. The segment-everything-everywhere addressed the critical need to account for 3D volumetric
model enhances VFM capabilities by introducing a or temporal information, which is vital for medical image
universal prompting scheme that enables semantic-aware segmentation. Innovations such as 3DSAM-Adapter and
13
open-set segmentation, expanding their use in real-world modality-agnostic SAM (MA-SAM) have incorporated
34
scenarios. SegGPT, in turn, standardizes segmentation 3D convolutional adapters to transform SAM’s 2D
data and employs in-context learning for both images architecture into one capable of recognizing 3D structures.
and videos, allowing it to handle diverse segmentation Similarly, SAMMed3D introduced a framework to
11
tasks without requiring additional task-specific training. generate 3D prompts from 2D points, helping SAM process
Complementing these advances, DINOv2 scales up volumetric data more effectively. The success of these 3D
25
Vision Transformer (ViT) pre-training by increasing data adaptations highlights the importance of leveraging spatial
and model size, producing more general and transferable information for more accurate segmentation. Recent trends
visual features that simplify fine-tuning across a wide indicate a shift toward prompt-free or semiautomatic
range of tasks, further broadening VFM applicability. The systems, like AutoSAM Adapter, which aim to maintain
15
SAM is one of the most notable VFMs for general-purpose SAM’s zero-shot capabilities while minimizing manual
4
image segmentation. Pre-trained on 11 million images and prompt generation.
1 billion masks, SAM enables interactive, prompt-driven
zero-shot segmentation across a wide variety of visual 2.3. PETL
tasks. Its impressive versatility has made it a key model With the widespread adoption of foundational models,
for applications such as image segmentation, inpainting, PETL has garnered significant attention. PETL methods
and tracking. However, it still faces limitations in specific can be categorized into three main groups. One approach
domains such as medical imaging, camouflage detection, is addition-based methods, which involve integrating
and shadow segmentation. 26 lightweight adapters or prompts into the original model.
These adapters or prompts allow the fine-tuning of only a
2.2. Adaptation of the SAM in medical imaging small number of additional parameters, enabling the model
The adaptation of SAM for medical imaging has evolved to adapt to specific tasks while preserving the majority
rapidly, driven by its impressive zero-shot performance of its pre-trained weights. This approach minimizes the
in natural image segmentation. Initial evaluation computational overhead associated with training large
studies 27-30 examined SAM’s applicability to medical image models, as only the newly introduced components require
segmentation, but its performance often fell short due optimization. Another strategy focuses on specification-
9,35
to the domain gap between natural and medical images. based methods, which prioritize the identification and
28
For instance, He et al. noted a performance gap of up to tuning of a small proportion of influential parameters
70% in Dice scores compared to domain-specific models. from the original model. This method often employs
This highlighted the need for task-specific fine-tuning. techniques such as sensitivity analysis to determine
Following this, research attention shifted from evaluation which parameters have the most significant impact on
to the adaptation of SAM for medical images. 12,13,15,17 the model’s performance for a given task. By selectively
Several studies have experimented with fine-tuning updating these parameters, specification-based methods
SAM by modifying its prompt design to handle the aim to achieve efficient adaptation while reducing training
specific characteristics of medical data. SAM-Med2D, burden and maintaining high performance levels. 10,13 In
31
for example, leveraged more comprehensive prompts, addition, reparameterization-based methods leverage low-
including points, bounding boxes, and masks, to optimize rank representations to minimize the number of trainable
SAM for 2D medical image segmentation, whereas the parameters during the fine-tuning process. Techniques
medical SAM adapter incorporated point prompts and such as Low-Rank Adaptation and factorized tuning
12
adapters to inject medical domain knowledge into SAM’s allow models to maintain their expressive power while
architecture. Although these approaches enhanced SAM’s significantly reducing the number of parameters that need
performance, the creation of prompts for each 2D slice to be adjusted. This approach not only enhances efficiency
of 3D medical data proved to be labor-intensive. Efforts but also enables strong performance across various PETL
to adapt SAM for 3D medical image segmentation have tasks, as it effectively captures the essential features
focused on overcoming this limitation. MedLSAM and required for adaptation. Recently, PETL techniques have
7
32
SAM3D applied SAM to 3D datasets, with approaches been successfully utilized to adapt VFMs for a wide range
33
such as SAMed and Med-Tuning employing techniques of downstream tasks, including image classification, object
10
9
such as Low-Rank Adaptation to fine-tune SAM for 3D detection, and, notably, medical image segmentation.
tasks. However, most of these methods have not fully Researchers have explored ways to fine-tune vision models
Volume 2 Issue 4 (2025) 116 doi: 10.36922/AIH025080010

