Page 123 - AN-4-2
P. 123

Advanced Neurology                                                          ML for EEG signal recognition



            provides a detailed breakdown of the model’s predictions   From a clinical perspective, the confusion matrices
            by  categorizing  them  into  true  positives,  true  negatives,   provide critical insights into the reliability of these models
            false positives, and false negatives. This breakdown allows   for diagnosing epilepsy. False positives (non-epileptic
            for a nuanced understanding of the model’s strengths and   signals misclassified as epileptic) can lead to unnecessary
            weaknesses, particularly in high-stakes applications such as   anxiety and potentially harmful interventions, while false
            the classification of epileptic and non-epileptic EEG signals.   negatives (epileptic signals misclassified as non-epileptic)
            In this study, confusion matrices were calculated for three   may  result  in  missed  diagnoses  and  delayed  treatment.
            models – DNN, CNN, and LightGBM – on both validation   For example, the DNN model had 46 false positives and
            and  test  datasets.  The  validation  set  represents  data  the   56 false negatives on the validation set, compared to 15
            model has encountered during training (20% of the dataset),   false positives and 35 false negatives on the test set. The
            while the test set consists of unseen data (10% of the dataset).   CNN model showed slightly better performance, with
            Comparing the performance on these two datasets provides   fewer false positives and false negatives on both datasets.
            insights into the model’s generalization ability.  The LightGBM model, while achieving higher validation
                                                               accuracy, had  a  higher  number  of  false  positives  and
              The purpose of calculating confusion matrices on
            both validation and test sets is to assess how well the   false negatives on the test set, raising concerns about its
                                                               reliability in clinical settings.
            model performs on data it has seen during training versus
            completely unseen data. A  small decrease in accuracy   When comparing the models, the CNN appears to strike
            (calculated as the sum of true positives and true negatives   the best balance between validation and test performance,
            divided by the total number of predictions, which   with minimal overfitting and fewer clinically significant
            represents the proportion of correctly classified instances   errors. The DNN model also performs well but shows
            out of all instances) from the validation set to the test set   slightly higher false negatives, which could be problematic
            is expected and indicates that the model has generalized   in a clinical context. The LightGBM model, despite its high
            well. However, a significant drop in performance   validation accuracy, demonstrates a larger performance
            suggests overfitting, where the model has memorized   gap between validation and test sets, indicating overfitting
            patterns specific to the training data rather than learning   and  reduced  generalizability.  These  findings  underscore
            generalizable features. For instance, in the DNN model   the importance of evaluating ML models not only on their
            (Figure 2), the validation accuracy was 81.6%, while the   overall accuracy but also on their ability to minimize false
            test accuracy dropped slightly to 80.3%, indicating good   positives and false negatives, particularly in applications
            generalization. Similarly, the CNN model (Figure  3)   where diagnostic accuracy has direct implications for
            showed validation and test accuracies of 83.2% and 82.7%,   patient care.
            respectively, demonstrating robust performance. In
            contrast, the LightGBM model (Figure 4) exhibited a more   3.3. ROC curves
            pronounced drop, with validation accuracy at 87.2% and   The ROC curve is a graphical representation of a model’s
            test accuracy at 81.5%, suggesting potential overfitting.  diagnostic ability, plotting the TPR (sensitivity) against the

                         A                                     B



















            Figure 2. Confusion matrix of DNN performance (%). (A) The performance of the model on the validation data (20%) which the model has previously
            encountered. (B) The performance of the model on the test data (10%) which the model has not previously encountered.
            Abbreviation: DNN: Dense Neural Network.



            Volume 4 Issue 2 (2025)                        117                               doi: 10.36922/an.7941
   118   119   120   121   122   123   124   125   126   127   128