Page 120 - AIH-2-4
P. 120
Artificial Intelligence in Health
ORIGINAL RESEARCH ARTICLE
RefSAM3D: Adapting the Segment Anything
Model with cross-modal references for
three-dimensional medical image segmentation
Xiang Gao and Kai Lu*
Department of Anesthesiology, Nanjing Drum Tower Hospital, Nanjing University, Nanjing, Jiangsu,
China
Abstract
The Segment Anything Model (SAM), originally built on a two-dimensional vision
transformer, excels at capturing global patterns in two-dimensional natural images
but faces challenges when applied to three-dimensional (3D) medical imaging
modalities such as computed tomography and magnetic resonance imaging. These
modalities require capturing spatial information in volumetric space for tasks such
as organ segmentation and tumor quantification. To address this challenge, we
introduce RefSAM3D, an adaptation of SAM for 3D medical imaging by incorporating
a 3D image adapter and cross-modal reference prompt generation. Our approach
modifies the visual encoder to handle 3D inputs and enhances the mask decoder
for direct 3D mask generation. We also integrate textual prompts to improve
segmentation accuracy and consistency in complex anatomical scenarios. By
*Corresponding author: employing a hierarchical attention mechanism, our model effectively captures and
Kai Lu
(961340955@qq.com) integrates information across different scales. Extensive evaluations on multiple
medical imaging datasets demonstrate that RefSAM3D outperforms state-of-the-art
Citation: Gao X, Lu K. RefSAM3D:
Adapting the Segment Anything methods. Our work thus advances the application of SAM in accurately segmenting
Model with cross-modal references complex anatomical structures in medical imaging.
for three-dimensional medical
image segmentation. Artif Intell
Health. 2025;2(4):114-128. Keywords: Three-dimensional medical imaging; Cross-modal reference prompt;
doi: 10.36922/AIH025080010
Volumetric segmentation; Vision transformer
Received: February 17, 2025
Revised: May 1, 2025
Accepted: June 23, 2025 1. Introduction
Published online: August 14, 2025
Medical image segmentation is a fundamental task in medical imaging, primarily aimed
Copyright: © 2025 Author(s). at identifying and extracting specific anatomical structures, such as organs, lesions, and
This is an Open-Access article tissues, from medical images. This process is crucial for numerous clinical applications,
distributed under the terms of the
Creative Commons Attribution including computer-aided diagnosis, treatment planning, and disease progression
License, permitting distribution, monitoring. Accurate image segmentation provides precise volumetric and shape
and reproduction in any medium, information about target structures, which is essential for further clinical applications
provided the original work is
properly cited. such as disease diagnosis, quantitative analysis, and surgical planning. 1-3
4,5
Publisher’s Note: AccScience Currently, recent breakthroughs in foundational models for image segmentation
Publishing remains neutral with have yielded transformative results, leveraging extensive datasets to capture general
regard to jurisdictional claims in
published maps and institutional representations that exhibit exceptional generalizability and performance. However,
affiliations. despite these strides, significant challenges arise when applying these models, particularly
Volume 2 Issue 4 (2025) 114 doi: 10.36922/AIH025080010

