Page 128 - AIH-2-4
P. 128

Artificial Intelligence in Health                                 RefSAM3D for medical image segmentation



            the introduction of an efficient pairing attention module.   other approaches across a wide range of tasks, achieving
            In addition, we compared our approach with 3D UNet-  the highest scores in nearly all scenarios, particularly
            eXpanded Network (UX-Net),  a method designed to   excelling  in  challenging  tumor  types.  In  kidney  tumor
                                     55
            create a simple, efficient, and lightweight network that   segmentation, despite challenges such as low contrast
            combines the capabilities of hierarchical transformers with   with  surrounding  tissues,  blurred  boundaries,  and high
            the advantages of ConvNet modules. We also evaluated   morphological heterogeneity, Ref-SAM3D achieved a dice
            SAM-B, which is the base model of SAM trained on natural   score of 95.53% and an NSD of 99.45%, surpassing other
            images and directly applied to medical images without   methods. For pancreatic tumors, which constitute less than
            adaptation. Finally, our method was benchmarked against   0.5% of CT images and exhibit diverse shapes, Ref-SAM3D
            the latest SAM adaptation techniques, including 3DSAM-  achieved a dice score of 82.42%, representing a 2.12%
            adapter —a promptable 3D medical image segmentation   improvement over existing state-of-the-art techniques.
                  13
            model—and MA-SAM,  a framework that utilizes       In liver tumor segmentation, Ref-SAM3D attained a dice
                                34
            parameter-efficient fine-tuning strategies and 3D adapters.  score of 80.10%, effectively handling variations in grayscale
              The results presented in Table 2 demonstrate that our   and irregular shapes. Despite the extensive distribution and
            proposed Ref-SAM3D method consistently outperformed   complex anatomical structure of colorectal cancer lesions,

            Table 1. Datasets used in our experiments and their corresponding prompt content descriptions

            Task                      Dataset name                        Prompt content
            Kidney tumor segmentation  KiTS21 Challenge  CT images, kidneys, tumors, and cysts segmentation, spacing (0.5, 0.44, 0.44) mm to (5.0,
                                                  1.04, 1.04) mm, dimensions (29, 512, 512) to (1,059, 512, 796)
            Pancreas tumor segmentation  MSD pancreas  CT images, pancreas tumor segmentation, resolution 512×512, slices 37–751
            Liver tumor segmentation  LiTS dataset  CT images, liver tumor segmentation, axial resolution 0.56–1.0 mm, z-direction resolution
                                                  0.45–6.0 mm
            Colon cancer segmentation  MSD colon dataset CT images, colon cancer segmentation, and abdominal scans
            MRI cardiac segmentation  MM-WHS      MRI images, cardiac structure segmentation (LVC, RVC, LAC, RAC, AA), resolution
                                     Challenge    512×512, voxel spacing 0.3–0.6 mm
            Abdominal multi-organ    BTCV Challenge   CT images, abdominal organ segmentation (13 organs), slice thickness 2.5–5.0 mm,
            segmentation                          in-plane resolution 0.54×0.54 mm² to 0.98×0.98 mm²
            Multi-modality abdominal   AMOS22 dataset  CT and MRI images, abdominal organ segmentation (15 organs), varying modalities and
            multi-organ segmentation              resolutions
            Abbreviations: AA: Ascending aorta; AMOS: Abdominal Multi-Organ Segmentation; BTCV: Beyond the Cranial Vault; CT: Computed tomography;
            MM-WHS: Multi-Modality Whole Heart Segmentation; MRI: Magnetic resonance imaging; MSD: Medical Segmentation Decathlon; LAC: Left atrium blood
            cavity; LiTS: Liver Tumor Segmentation Benchmark; LVC: Left ventricle blood cavity; RAC: Right atrium blood cavity; RVC: Right ventricle blood cavity.

            Table 2. Comparison with classical medical image segmentation methods on four tumor segmentation datasets

            Methods                            Kidney tumor     Pancreas tumor    Liver tumor     Colon cancer
                                             Dice     NSD       Dice     NSD    Dice    NSD      Dice    NSD
            nnU-Net                          73.07    77.47     41.65    62.54  60.10   75.41    43.91   52.52
            Swin-UNETR                       65.54    72.04     40.57    60.05  50.26   64.32    35.21   42.94
            UNETR++                          56.49    60.04     37.25    53.59  37.13   51.99    25.36   30.68
            nnFormer                         45.14    42.28     36.53    53.97  45.54   60.67    24.28   32.19
            3D UX-Net                        57.59    58.55     34.83    52.56  45.54   60.67    28.50   32.73
            SAM-B (10 pts/slice)             40.07    34.96     30.55    32.91   8.56    5.97    39.14   42.70
            3DSAM-adapter (10 points/volume)  74.91   84.35     57.47    79.62  56.61   69.52    49.99   65.67
            MA-SAM (1 relaxed 3D bounding box/slice)  93.38  98.91  80.30  97.19  75.23  92.31   65.45   81.40
            Ref-SAM3D                        95.53    99.45     82.42    98.41  80.10   93.23    70.14   88.90
            Note: All data presented as percentages (%).
            Abbreviations: 3D: Three-dimensional; nn: No new; NSD: Normalized surface Dice; SAM: Segment Anything Model; UNETR: U-Net Transformers;
            UX-Net: UNet-eXpanded Network.


            Volume 2 Issue 4 (2025)                        122                          doi: 10.36922/AIH025080010
   123   124   125   126   127   128   129   130   131   132   133