Page 127 - AIH-2-4
P. 127
Artificial Intelligence in Health RefSAM3D for medical image segmentation
abdominal CT scans with annotations for both the includes abdominal CT and MRI scans from different
pancreas and pancreatic tumors. This dataset is part of the patients, with each scan annotated for 15 organs. In line
MSD pancreas segmentation challenge. Each CT volume with the approach in MA-SAM, we limited our evaluation
has a resolution of 512 × 512 pixels, with the number of to the 12 organs common to both the AMOS22 and BTCV
slices per scan ranging from 37 to 751. The authors filtered datasets. For generalization testing, we utilize 300 CT
the dataset to retain only the axial view images containing scans and 60 MRI scans from the AMOS22 training and
more than 5% pancreatic content. Consistent with previous validation sets.
studies, we merged the pancreas and pancreatic tumor
masks into a single entity for segmentation. 4.1.2. Implementation details
The liver tumor segmentation benchmark (LiTS) We implemented our method and benchmarked it against
44
dataset is a publicly available benchmark dataset focused baseline models using PyTorch (version 2.7.1) and the
on liver and liver tumor segmentation. It was created to medical open network for AI framework, specifically
evaluate and compare the performance of automated liver utilizing SAM-B for all experiments, which employs
and liver tumor segmentation algorithms. The LiTS dataset ViT-B as the image encoder backbone. The training was
comprises 201 abdominal CT scans, of which 194 contain conducted on an NVIDIA A40 GPU (United States)
liver lesions. The dataset is divided into 131 training cases with a batch size of 1, using the AdamW optimizer with
and 70 testing cases. The resolution and quality of the CT a linear learning rate scheduler for a total of 200 epochs.
images vary, with axial resolutions ranging from 0.56 mm The initial learning rate was set to 1e-4, with a momentum
to 1.0 mm and z-direction resolutions ranging from of 0.9 and a weight decay of 1e-5. Data preprocessing
0.45 mm to 6.0 mm. involved adjusting the isotropic spacing to 1 mm. For
data augmentation, we applied various transformations,
45
The MSD colon dataset is a publicly available including random rotation, flipping, erasing, shearing,
benchmark dataset focused on primary colon cancer scaling, translation, posterization, contrast adjustments,
segmentation from CT images. The dataset consists of 190 brightness modifications, and sharpness enhancements.
abdominal CT scans in total, which are divided into 126 During training, we also sampled foreground and
training cases and 64 testing cases. Each case is annotated background patches at a 1:1 ratio. For single-organ cancer
with segmentation masks identifying the primary colon segmentation, we assessed our method’s performance
cancer regions. through comparisons with state-of-the-art volumetric
For cardiac segmentation, we utilized the multi- segmentation and fine-tuning techniques, using the dice
modality whole heart segmentation (MM-WHS) Challenge coefficient and normalized surface dice (NSD) as evaluation
2017 dataset, which contains 20 CT and 20 MRI scans metrics, similar to Sam-med3d [11]. For multi-organ
46
with pixel-level ground-truth annotations. These scans segmentation, we employed the dice coefficient and
were collected in a real clinical setting and include five Hausdorff distance (HD) as evaluation metrics. For each
anatomical labels: left ventricle blood cavity, right ventricle dataset, we designed specific text prompts to guide the
blood cavity, left atrium blood cavity, right atrium blood segmentation process, as shown in Table 1. These prompts
cavity, and ascending aorta. In our experiments, only the were carefully crafted to provide clear anatomical context
CT scans were used, which contain between 177 and 363 while maintaining consistency across different organs and
slices, each with a resolution of 512 × 512 pixels and voxel pathologies.
spacing ranging from 0.3 to 0.6 mm. 4.2. Comparison with state-of-the-art methods
The Beyond the Cranial Vault (BTCV) challenge Our method was extensively evaluated against a wide
dataset comprises 30 CT volumes, each manually labeled range of state-of-the-art 3D medical image segmentation
47
with 13 different abdominal organs. The number of slices techniques on both CT and MRI datasets. These techniques
per scan ranges between 85 and 198, with a slice thickness include the convoluted neural network-based no new
varying between 2.5 mm and 5.0 mm. All scans have (nn)U-Net —an automated configuration framework
50
an axial resolution of 512 × 512, whereas the in-plane evolved from the U-Net architecture —and the Swin
51
resolution varies from 0.54 × 0.54 mm² to 0.98 × 0.98 mm². U-Net transformers (Swin-UNETR), which employs
52
We followed the data split proposed by Tang et al., a hierarchical encoder structure for 3D segmentation
48
utilizing 24 cases for training and 6 cases for testing. tasks. Furthermore, we also considered nnFormer, a
53
For evaluating the model’s generalization ability, we model that integrates both local and global volumetric
also used the multi-modality abdominal multi-organ self-attention mechanisms, and UNETR++, which
54
segmentation challenge (AMOS22) dataset. This dataset enhances segmentation accuracy and efficiency through
49
Volume 2 Issue 4 (2025) 121 doi: 10.36922/AIH025080010

