Page 41 - AIH-2-4
P. 41

Artificial Intelligence in Health                                       ViT for neurodegeneration diagnosis



               the  ternary  classification  of  NDDs  solely  based  on   MCI sub-categories, namely convertible MCI (MCI-c)
               18 F-FDG PET scans.                             and  stable  MCI  (MCI-s),  to  predict  MCI  progression
            •   Combining the model’s attention maps and the AAL3   to  AD. The authors also studied attention regions  for
               brain atlas for improved model explainability. Apart   model explainability. Shin  et al.  proposed applying
                                                                                           20
               from the predicted label, our model provides a heatmap   ViTs on  F-florbetaben scans for binary and ternary
                                                                       18
               overlaid  on  the  original  input  scan,  highlighting  the   classification of NDDs. Although this type of PET scan,
               most influential brain regions to the model’s prediction.   which demonstrates beta-amyloid (β-amyloid) plaques in
               Furthermore, the model delivers names of the key areas   the brain, has proved beneficial in identifying NDDs, it is
               with the assistance of the AAL3 brain atlas.    often used in research settings.  Therefore,  F-FDG has
                                                                                                   18
                                                                                        21
            •   Performing a comprehensive brain regions’ importance   remained the most commonly used brain PET imaging
               analysis by combining the model’s attention maps   technique.  Xing et al.  developed a multi-modal ViT by
                                                                                 22
                                                                       11
               and AAL3 atlas to find the most influential areas in   combining two types of PET brain scans ( F-FDG and  F-
                                                                                                18
                                                                                                          18
               the model’s predictions. This study aims to enhance   AV45) for the binary classification of NDDs. Specifically,
               the  model’s  explainability  and  suggest  key  areas  in   the proposed model includes two ViTs, each specialized
               distinguishing various brain conditions.        in extracting features of a specific PET type. Then, the
                                                               extracted features are concatenated and fed into a classifier
            2. Related works                                   for the final prediction.  Similarly, Odusami  et al.
                                                                                                            23
                                                                                    22
            Deep learning algorithms have shown outstanding    suggested an approach for binary classification of NDDs
            results and potential in solving intricate tasks, motivating   by fusing MRI and PET brain scans.
            researchers to employ them for various medical image   Most studies have concentrated on applying ViTs to MRI
            analysis tasks, including NDD classification.      data. Unlike PET scans, which expose metabolic activities
              Before the emergence of transformer-based vision   and functions, MRI is supposed to reveal the brain’s
            models, such as ViTs,  most researchers had focused on   structure. Therefore, MRI is usually beneficial in diagnosing
                              6
            employing CNNs for NDD classification.  Etminani et al.    NDDs at later stages, when the disease causes abnormalities
                                            4,5
                                                          4
                                                                                               24
            proposed a comprehensive data pre-processing pipeline and   in the physical brain’s structure. Lyu et al.  developed a ViT
            a 3D CNN model based on VGG16  for NDD classification   solely based on an MRI dataset for a binary classification
                                       14
                 18
            using  F-FDG PET scans. The authors demonstrated that   task;  however,  the  authors  added  convolutional  layers  to
                                                                                                    25
            3D CNN algorithms could obtain competitive results   their model to obtain better results. Sarraf et al.  proposed
            compared  to  human  readers,  outperforming  experienced   OVITAD, an optimized ViT architecture trained on a
            nuclear medicine physicians independently and their   combination of functional MRI and structural MRI (sMRI)
            consensus.  Furthermore, Etminani  et al.  focused on   to classify NDDs. Furthermore, the authors used attention
                    4
                                               4
            explainability and dedicated a part of their research to   maps to achieve better model interpretation. Hoang  et
                                                                 26
            interpreting the suggested model using an occlusion   al.  focused their study on predicting MCI cases that
            experiment.  Ding  et al.  developed a CNN established   could potentially progress into AD; therefore, the authors
                     15
                                5
                                                                                                            26
            on inception-v3  to classify NDDs through brain  F-  trained their ViT on sMRI data for a binary classification.
                                                       18
                         16
                                                                          27
                                                                                                      28
            FDG PET scans. The authors also compared their model’s   Aghdam et al.  applied a pre-trained pyramid ViT  to sMRI
                                                                                                     29
            performance to radiology readers’ using a subset of the   data to classify CN and AD cases. Kushol et al.  designed
            ADNI and an independent test dataset, which resulted in   Addformer, which utilizes a new fusion transformer block
            the model’s superior results in both cases. Furthermore,   that combines sMRI data in spatial and frequency domains to
            Ding et al.  employed the saliency map approach  for the   improve binary classification accuracy. They also visualized
                                                   17
                    5
            model interpretation and analysis. Lozupone et al.  utilized   the model’s attention maps, similar to most ViT-based
                                                   18
            2D CNNs and a new explainable AI strategy to develop an   studies, to gain model explainability. Shah et al.  introduced
                                                                                                   30
            interpretable model for classifying NDDs; however, the   the multi-modal Bi-vision Transformer (BiViT), a ViT that
            authors aimed for a two-class classification in their research   includes two modules of mutual latent fusion and parallel
            and used 3D MRI brain scans for designing the model.  coupled encoding strategy to enhance feature learning. The
              The advent of ViTs  and their cutting-edge performance   authors also utilized MRI data and demonstrated tokens for
                             6
            in various computer vision tasks convinced researchers   a better model understanding.
            to investigate utilizing them in the medical domain and   While we aimed for NDD classification in this work,
                                       19
            NDD diagnosis. Khatri and Kwon  focused on designing   there are some key differences compared to the literature,
            an explainable ViT utilizing self-supervised learning   as follows:
            and  F-FDG PET scans for binary classification of two   •   Achieving competitive performance in ternary NDD
                18
            Volume 2 Issue 4 (2025)                         35                          doi: 10.36922/AIH025140026
   36   37   38   39   40   41   42   43   44   45   46