Page 127 - AIH-2-4
P. 127

Artificial Intelligence in Health                                 RefSAM3D for medical image segmentation



            abdominal CT scans with annotations for both the   includes abdominal CT and MRI scans from different
            pancreas and pancreatic tumors. This dataset is part of the   patients, with each scan annotated for 15 organs. In line
            MSD pancreas segmentation challenge. Each CT volume   with the approach in MA-SAM, we limited our evaluation
            has a resolution of 512 × 512 pixels, with the number of   to the 12 organs common to both the AMOS22 and BTCV
            slices per scan ranging from 37 to 751. The authors filtered   datasets. For generalization testing, we utilize 300 CT
            the dataset to retain only the axial view images containing   scans and 60 MRI scans from the AMOS22 training and
            more than 5% pancreatic content. Consistent with previous   validation sets.
            studies, we merged the pancreas and pancreatic tumor
            masks into a single entity for segmentation.       4.1.2. Implementation details
              The liver tumor segmentation benchmark (LiTS)    We implemented our method and benchmarked it against
                                                         44
            dataset is a publicly available benchmark dataset focused   baseline models using PyTorch (version  2.7.1) and the
            on liver and liver tumor segmentation. It was created to   medical open network for AI framework, specifically
            evaluate and compare the performance of automated liver   utilizing SAM-B for all experiments, which employs
            and liver tumor segmentation algorithms. The LiTS dataset   ViT-B as the image encoder backbone. The training was
            comprises 201 abdominal CT scans, of which 194 contain   conducted on an NVIDIA A40 GPU (United States)
            liver lesions. The dataset is divided into 131 training cases   with a batch size of 1, using the AdamW optimizer with
            and 70 testing cases. The resolution and quality of the CT   a linear learning rate scheduler for a total of 200 epochs.
            images vary, with axial resolutions ranging from 0.56 mm   The initial learning rate was set to 1e-4, with a momentum
            to 1.0  mm and z-direction resolutions ranging from   of 0.9 and a weight decay of 1e-5. Data preprocessing
            0.45 mm to 6.0 mm.                                 involved adjusting the isotropic spacing to 1  mm. For
                                                               data  augmentation,  we  applied  various  transformations,
                                    45
              The MSD colon dataset  is a publicly available   including  random  rotation,  flipping,  erasing,  shearing,
            benchmark dataset focused on primary colon cancer   scaling, translation, posterization, contrast adjustments,
            segmentation from CT images. The dataset consists of 190   brightness modifications, and sharpness enhancements.
            abdominal CT scans in total, which are divided into 126   During training, we also sampled foreground and
            training cases and 64 testing cases. Each case is annotated   background patches at a 1:1 ratio. For single-organ cancer
            with segmentation masks identifying the primary colon   segmentation, we assessed our method’s performance
            cancer regions.                                    through comparisons with state-of-the-art volumetric
              For cardiac segmentation, we utilized the multi-  segmentation and fine-tuning techniques, using the dice
            modality whole heart segmentation (MM-WHS) Challenge   coefficient and normalized surface dice (NSD) as evaluation
            2017 dataset,  which contains 20 CT and 20 MRI scans   metrics, similar to Sam-med3d [11]. For multi-organ
                      46
            with pixel-level ground-truth annotations. These scans   segmentation, we employed the dice coefficient and
            were collected in a real clinical  setting and include five   Hausdorff distance (HD) as evaluation metrics. For each
            anatomical labels: left ventricle blood cavity, right ventricle   dataset, we designed specific text prompts to guide the
            blood cavity, left atrium blood cavity, right atrium blood   segmentation process, as shown in Table 1. These prompts
            cavity, and ascending aorta. In our experiments, only the   were carefully crafted to provide clear anatomical context
            CT scans were used, which contain between 177 and 363   while maintaining consistency across different organs and
            slices, each with a resolution of 512 × 512 pixels and voxel   pathologies.
            spacing ranging from 0.3 to 0.6 mm.                4.2. Comparison with state-of-the-art methods
              The Beyond the Cranial Vault (BTCV) challenge    Our method was extensively evaluated against a wide
            dataset  comprises 30 CT volumes, each manually labeled   range of state-of-the-art 3D medical image segmentation
                 47
            with 13 different abdominal organs. The number of slices   techniques on both CT and MRI datasets. These techniques
            per scan ranges between 85 and 198, with a slice thickness   include the convoluted neural network-based no new
            varying  between  2.5  mm  and  5.0  mm.  All  scans  have   (nn)U-Net —an automated configuration framework
                                                                       50
            an  axial  resolution  of  512  ×  512,  whereas  the  in-plane   evolved from the U-Net architecture —and the Swin
                                                                                               51
            resolution varies from 0.54 × 0.54 mm² to 0.98 × 0.98 mm².   U-Net transformers (Swin-UNETR),  which employs
                                                                                              52
            We followed the data split proposed by Tang  et al.,    a hierarchical encoder structure for 3D segmentation
                                                         48
            utilizing 24 cases for training and 6 cases for testing.  tasks.  Furthermore,  we  also  considered  nnFormer,   a
                                                                                                          53
              For evaluating the model’s generalization ability, we   model that integrates both local and global volumetric
            also used the multi-modality abdominal multi-organ   self-attention mechanisms, and UNETR++,  which
                                                                                                      54
            segmentation challenge (AMOS22) dataset.  This dataset   enhances segmentation accuracy and efficiency through
                                               49
            Volume 2 Issue 4 (2025)                        121                          doi: 10.36922/AIH025080010
   122   123   124   125   126   127   128   129   130   131   132