Page 81 - AIH-2-1
P. 81

Artificial Intelligence in Health                                       ViT for Glioma Classification in MRI



                         A                                   B














                                               C


















            Figure 5. Performance of model fine-tuning using 8 × 8 patches. (A) Variation of model loss versus epoch. (B) Variation of model accuracy versus epoch.
            (C) Classification performance of the model presented using the confusion matrix.
            Abbreviations: HGG: High-grade glioma; LGG: Low-grade glioma.

            with 4 × 4 patch resolution. This CNN model has four   problem was 63.2% (Figure  8A), whereas that of two-class
            convolution layers with only 8 million trainable parameters   problem was 81.8% (Figure 8B), i.e., the trained and fine-tuned
            and was trained using the same dataset as ViT to classify   ViT model could detect the presence and absence of tumors with
            the brain tumors.                                  higher accuracy than classifying the different grades of tumors.
                                                               The main reason behind this observation is the restriction in the
              Figure 7A shows the training and validation accuracy
            for the CNN model while Figure 7B shows that of the ViT   number of images belonging to each class. For the three-class
                                                               problem, the dataset showed a class imbalance, whereas it was
            model for brain tumor classification.  Both models were
            trained using the same dataset under optimized settings.   balanced for the two-class problem. This observation indicated
            Although the CNN model showed nearly prefect accuracy   that the dataset used was suitable for tumor identification with
            with training, it underperformed during validation and   two classes: with tumor and without tumor.
            indicated model overfitting. By contrast, the ViT model   5. Discussion
            exhibited better performance in training and validation
            settings. As shown in Figure 7B, the ViT model exhibited a   CNN-based approaches are a popular choice for brain tumor
            considerable level of instability. To stabilize the ViT model,   classification using MRI images. They are highly effective
            it needs to be further trained using a large dataset. However,   in processing and analyzing medical data owing to their
            one of the critical factors in medical image analysis is the   ability to automate feature extraction, capture hierarchical
            limitations in dataset; therefore, stabilizing the ViT model   features, perform end-to-end learning, and yield high-
            with small datasets is challenging.                accuracy output. However, transformers are emerging as
                                                               leading contenders for this task, mainly because of their
            4.3. Model performance under two-class problem     global context modeling features. In particular, their
            Furthermore, the accuracy of the ViT model with 4 × 4 patch size   capacity to capture long-range dependencies and ability to
            was analyzed for the task of classifying MRI images as with tumor   focus on relevant parts of the input images are noteworthy.
            or without tumor. According to the confusion matrix shown in   CNN-based architectures perform weakly, particularly
            Figure 8, the overall accuracy of classification for a three-class   with datasets that show large variation in terms of texture,


            Volume 2 Issue 1 (2025)                         75                               doi: 10.36922/aih.4155
   76   77   78   79   80   81   82   83   84   85   86