Page 130 - AIH-2-4
P. 130
Artificial Intelligence in Health RefSAM3D for medical image segmentation
Table 3. Comparison of abdominal multi‑organ segmentation results
Metric Method Spleen R.Kd L.Kd GB Eso. Liver Stomach Aorta IVC Veins Pancreas AG Average
Dice (%) nnU-Net 97.0 95.3 95.3 63.5 77.5 97.4 89.1 90.1 88.5 79.0 87.1 75.2 86.3
Swin-UNETR 95.6 94.2 94.3 63.6 75.5 96.6 79.2 89.9 83.7 75.0 82.2 67.3 83.1
UNETR++ 94.2 92.1 95.4 65.0 75.9 96.9 88.3 85.5 84.9 76.1 81.8 71.3 83.95
nnFormer 93.5 94.9 95.0 64.1 79.5 96.8 90.1 89.7 85.9 77.8 85.6 73.9 85.6
3D UX-Net 94.6 94.2 94.3 59.3 72.2 96.4 73.4 87.2 84.9 72.2 80.9 67.1 81.4
3DSAM-adapter 94.3 96.1 94.1 62.9 79.9 96.1 83.8 88.4 85.3 75.6 83.1 69.4 84.1
MA-SAM 96.7 95.1 95.4 68.2 82.1 96.9 92.8 91.1 87.5 79.8 86.6 73.9 87.2
Ref‑SAM3D 97.1 94.9 96.1 70.3 85.2 97.3 94.1 92.3 88.8 80.4 87.5 75.1 88.3
HD (%) nnU-Net 1.07 1.19 1.19 7.49 8.56 1.14 4.84 14.11 2.87 5.67 2.31 2.23 4.39
Swin-UNETR 1.21 1.41 1.37 2.25 5.82 1.70 13.75 5.92 4.46 7.58 3.53 3.40 4.37
UNETR++ 5.99 1.23 1.33 5.99 10.37 33.12 5.23 8.23 2.14 10.34 3.12 2.13 7.44
nnFormer 78.03 1.41 1.43 3.00 4.92 1.38 4.24 7.53 4.02 6.53 2.96 2.76 9.95
3D UX-Net 3.17 1.59 1.26 4.53 13.92 1.75 19.72 12.53 3.47 9.99 3.70 4.11 6.68
3DSAM-adapter 3.38 1.23 1.21 2.23 5.43 1.15 4.00 6.47 7.88 5.18 4.71 3.94 3.90
MA-SAM 1.00 1.19 1.07 1.59 3.77 1.36 3.87 5.29 3.12 3.25 3.93 2.57 2.67
Ref‑SAM3D 1.30 1.32 1.00 1.21 3.18 1.23 3.77 4.12 2.30 3.12 3.08 2.44 2.34
Abbreviations: 3D: Three-dimensional; AG: Average; Eso.: Esophagus; GB: Gall bladder; HD: Hausdorff distance; IVC: Inferior vena cava; L.Kd: Left
kidney; nn: No new; R.kd: Right kidney; SAM: Segment Anything Model; UNETR: U-Net Transformers; UX-Net: UNet-eXpanded Network.
Figure 4. Qualitative visualization of segmentation results generated from our Ref-SAM3D method and other state-of-the-art methods on the Beyond the
Cranial Vault dataset. Rkid and Lkid refer to the right and left kidneys, respectively. Sto, rad, and lad stand for stomach, respectively.
Abbreviations: 3D: Three-dimensional; IVC: Inferior vena cava; nn: No new; NSD: Normalized surface Dice; SAM: Segment Anything Model;
UNETR: U-Net Transformers; UX-Net: UNet-eXpanded Network.
performance, achieving a mean dice coefficient of 85.7% Furthermore, when employing a five-shot fine-tuning
on CT images, indicating robust generalization across strategy on the AMOS22 MRI data, Ref-SAM3D
different CT acquisition protocols and patient cohorts. exhibited even more impressive results, achieving a dice
Notably, in the challenging cross-modality scenario of score of 84.1% (Figure 6). This represents a substantial
MRI segmentation, our model maintained substantial improvement over the fine-tuned versions of nnU-Net
performance with a dice score of 63.2% (±3.1%), (72.4%) and Swin-UNETR (75.3%), demonstrating the
significantly surpassing baseline methods, including model’s superior adaptability and learning efficiency with
nnU-Net (12.1%) and Swin-UNETR (15.3%). minimal additional training data. These results underscore
Volume 2 Issue 4 (2025) 124 doi: 10.36922/AIH025080010

