Page 132 - AIH-2-4
P. 132

Artificial Intelligence in Health                                 RefSAM3D for medical image segmentation



            Table 5. The ablation experiments of each stage under the   (i) A cross-modal reference prompt generator that fuses
            hierarchical cross‑attention                       text and image embeddings into a unified feature space
            Stages         Dice (%)       Hausdorff distance (%)  through adaptive alignment, significantly enhancing
            All stages       88.3               2.34           spatial-semantic correlation, (ii) a multi-scale hierarchical
                                                               attention mechanism that dynamically prioritizes critical
            Stages 1 and 4   78.5               2.76           anatomical features across dimensional scales while
            Stages 2 and 4   82.1               2.62           suppressing irrelevant noise, significantly improving
            Stages 3 and 4   85.4               2.48           segmentation robustness in intricate 3D topologies, and
            Stage 4 only     73.78              2.89           (iii) a volumetric architecture adaptation that transforms
                                                               SAM’s native 2D processing into true 3D computation
            4.4.3. Effects of hierarchical cross-attention mechanism  through depth-aware convolutions and recursive mask
                                                               refinement, effectively bridging the dimensional gap
            The hierarchical fusion mechanism in Ref-SAM3D is   in  medical  imaging  analysis.  Extensive  validation
            pivotal for integrating information across various encoder   demonstrates state-of-the-art performance on complex
            layers, enabling the model to capture detailed, multi-  segmentation tasks. While our approach is highly effective,
            level semantic features essential for precise segmentation.   future work is needed to focus on improving computational
            Ablation studies, summarized in Table 4, demonstrate the   efficiency to enable real-time clinical applications,
            significance of this mechanism. Removing the hierarchical   exploring semi-supervised learning techniques to address
            fusion led to a sharp decline in segmentation accuracy, with   the challenge of limited labeled data. Overall, our method
            the dice coefficient dropping from 88.3% to 74.1%, and the   holds significant promise as a generalizable and robust
            HD increasing from 2.34% to 6.33%. This underscores the   segmentation  framework,  offering  both  fully  automatic
            mechanism’s role in effectively combining features across   and promptable segmentation capabilities for a wide range
            layers for better performance.                     of 3D medical imaging applications.
              Moreover, Table 5 provides a systematic evaluation of
            each block level’s contribution to the model. The results   Acknowledgments
            reveal that utilizing all layers (Stage 1–4) achieved the best   None.
            performance, with a dice score of 88.3% and an HD of
            2.34%. In contrast, excluding specific layers led to varied   Funding
            performance declines, with the shallow layers contributing   None.
            significantly to contextual information and deeper layers
            enhancing fine-grained details. For example, when only   Conflict of interest
            deeper layers (Stages 3 and 4) were used, the dice score
            dropped to 78.5%, and the HD increased to 2.76%. In   The authors declare they have no competing interests.
            contrast, including only the shallow layers (Stages 1 and 2)   Author contributions
            yielded a dice score of 73.78% and an HD of 2.89%.
                                                               Conceptualization: Xiang Gao
              These findings underscore the necessity of a
            comprehensive fusion approach. Each layer’s unique   Data curation: Xiang Gao
                                                               Investigation: Xiang Gao
            contributions—from the broad contextual cues in shallow   Methodology: Xiang Gao
            layers to the detailed semantic information in deeper   Visualization: Xiang Gao
            layers—work synergistically to enhance the model’s ability   Writing–original draft: Xiang Gao
            to capture complex anatomical structures, ultimately   Writing–review & editing: Kai Lu
            improving overall segmentation accuracy and robustness.
            5. Conclusion                                      Ethics approval and consent to participate
                                                               Not applicable.
            We present Ref-SAM3D, a 3D-adapted SAM framework
            that  synergizes  cross-modal  prompting  and  hierarchical   Consent for publication
            attention to address medical segmentation challenges in
            volumetric imaging. Our model establishes a bidirectional   Not applicable.
            interaction between visual data and semantic text   Availability of data
            descriptions, enabling intelligent segmentation  through
            joint reasoning over volumetric imaging and clinical   Data will be made available upon request to the
            context. Three key innovations drive our methodology:   corresponding author.


            Volume 2 Issue 4 (2025)                        126                          doi: 10.36922/AIH025080010
   127   128   129   130   131   132   133   134   135   136   137