Page 41 - AIH-2-4
P. 41
Artificial Intelligence in Health ViT for neurodegeneration diagnosis
the ternary classification of NDDs solely based on MCI sub-categories, namely convertible MCI (MCI-c)
18 F-FDG PET scans. and stable MCI (MCI-s), to predict MCI progression
• Combining the model’s attention maps and the AAL3 to AD. The authors also studied attention regions for
brain atlas for improved model explainability. Apart model explainability. Shin et al. proposed applying
20
from the predicted label, our model provides a heatmap ViTs on F-florbetaben scans for binary and ternary
18
overlaid on the original input scan, highlighting the classification of NDDs. Although this type of PET scan,
most influential brain regions to the model’s prediction. which demonstrates beta-amyloid (β-amyloid) plaques in
Furthermore, the model delivers names of the key areas the brain, has proved beneficial in identifying NDDs, it is
with the assistance of the AAL3 brain atlas. often used in research settings. Therefore, F-FDG has
18
21
• Performing a comprehensive brain regions’ importance remained the most commonly used brain PET imaging
analysis by combining the model’s attention maps technique. Xing et al. developed a multi-modal ViT by
22
11
and AAL3 atlas to find the most influential areas in combining two types of PET brain scans ( F-FDG and F-
18
18
the model’s predictions. This study aims to enhance AV45) for the binary classification of NDDs. Specifically,
the model’s explainability and suggest key areas in the proposed model includes two ViTs, each specialized
distinguishing various brain conditions. in extracting features of a specific PET type. Then, the
extracted features are concatenated and fed into a classifier
2. Related works for the final prediction. Similarly, Odusami et al.
23
22
Deep learning algorithms have shown outstanding suggested an approach for binary classification of NDDs
results and potential in solving intricate tasks, motivating by fusing MRI and PET brain scans.
researchers to employ them for various medical image Most studies have concentrated on applying ViTs to MRI
analysis tasks, including NDD classification. data. Unlike PET scans, which expose metabolic activities
Before the emergence of transformer-based vision and functions, MRI is supposed to reveal the brain’s
models, such as ViTs, most researchers had focused on structure. Therefore, MRI is usually beneficial in diagnosing
6
employing CNNs for NDD classification. Etminani et al. NDDs at later stages, when the disease causes abnormalities
4,5
4
24
proposed a comprehensive data pre-processing pipeline and in the physical brain’s structure. Lyu et al. developed a ViT
a 3D CNN model based on VGG16 for NDD classification solely based on an MRI dataset for a binary classification
14
18
using F-FDG PET scans. The authors demonstrated that task; however, the authors added convolutional layers to
25
3D CNN algorithms could obtain competitive results their model to obtain better results. Sarraf et al. proposed
compared to human readers, outperforming experienced OVITAD, an optimized ViT architecture trained on a
nuclear medicine physicians independently and their combination of functional MRI and structural MRI (sMRI)
consensus. Furthermore, Etminani et al. focused on to classify NDDs. Furthermore, the authors used attention
4
4
explainability and dedicated a part of their research to maps to achieve better model interpretation. Hoang et
26
interpreting the suggested model using an occlusion al. focused their study on predicting MCI cases that
experiment. Ding et al. developed a CNN established could potentially progress into AD; therefore, the authors
15
5
26
on inception-v3 to classify NDDs through brain F- trained their ViT on sMRI data for a binary classification.
18
16
27
28
FDG PET scans. The authors also compared their model’s Aghdam et al. applied a pre-trained pyramid ViT to sMRI
29
performance to radiology readers’ using a subset of the data to classify CN and AD cases. Kushol et al. designed
ADNI and an independent test dataset, which resulted in Addformer, which utilizes a new fusion transformer block
the model’s superior results in both cases. Furthermore, that combines sMRI data in spatial and frequency domains to
Ding et al. employed the saliency map approach for the improve binary classification accuracy. They also visualized
17
5
model interpretation and analysis. Lozupone et al. utilized the model’s attention maps, similar to most ViT-based
18
2D CNNs and a new explainable AI strategy to develop an studies, to gain model explainability. Shah et al. introduced
30
interpretable model for classifying NDDs; however, the the multi-modal Bi-vision Transformer (BiViT), a ViT that
authors aimed for a two-class classification in their research includes two modules of mutual latent fusion and parallel
and used 3D MRI brain scans for designing the model. coupled encoding strategy to enhance feature learning. The
The advent of ViTs and their cutting-edge performance authors also utilized MRI data and demonstrated tokens for
6
in various computer vision tasks convinced researchers a better model understanding.
to investigate utilizing them in the medical domain and While we aimed for NDD classification in this work,
19
NDD diagnosis. Khatri and Kwon focused on designing there are some key differences compared to the literature,
an explainable ViT utilizing self-supervised learning as follows:
and F-FDG PET scans for binary classification of two • Achieving competitive performance in ternary NDD
18
Volume 2 Issue 4 (2025) 35 doi: 10.36922/AIH025140026

