Page 121 - AN-4-2
P. 121

Advanced Neurology                                                          ML for EEG signal recognition



            separability and probabilistic outputs. The Adam optimizer   Table 1. A comparison of model performance metrics on the
            (lr = 0.001) balanced convergence speed/stability.  EEG data
            2.4. Model evaluation and visualization            Model   Accuracy AUC Recall Precision  F1  Kappa MCC

            Model performance was rigorously evaluated using the   CNN  0.8194  0.8716 0.7315  0.8778  0.7980 0.6371 0.6458
            validation and test datasets. For the neural network   DNN  0.7698  0.8190 0.7407  0.7767  0.7583 0.5387 0.5393
            models, training history was visualized through plots of   LightGBM  0.8590  0.9143 0.8156  0.8679  0.8397 0.7141 0.7168
            accuracy and loss against epochs. These plots provided   RF  0.8572  0.9117 0.8055  0.8727  0.8364 0.7101 0.7134
            insights into the models’ learning processes, highlighting   ET  0.8544  0.9111 0.7995  0.8733  0.8331 0.7044 0.7086
            convergence behavior and potential overfitting. Confusion   GBC  0.8442  0.9071 0.8055  0.8480  0.8245 0.6846 0.6876
            matrices were constructed to depict the classification   ADA  0.8074  0.8570 0.7670  0.8047  0.7826 0.6100 0.6140
            performance, illustrating true positive, false positive,
            true negative, and false negative counts for both epileptic   LR  0.8046  0.8260 0.7691  0.7979  0.7816 0.6050 0.6073
            and non-epileptic cases. Additional metrics such as area   KNN  0.7853  0.8435 0.7307  0.7797  0.7571 0.5677 0.5711
            under the curve (AUC), Cohen’s kappa, and Matthew’s   DT    0.7271  0.7254 0.7227  0.7452  0.7338 0.4520 0.4556
            correlation coefficient (MCC) were calculated to provide   Ridge  0.7115  0.7218 0.6558  0.6942  0.6730 0.4156 0.4173
            a comprehensive assessment of classification effectiveness.  LDA  0.6913  0.6957 0.6314  0.6691  0.6499 0.3743 0.3754
              Visualization played a pivotal role in interpreting model   NB  0.5816  0.5786 0.7992  0.8573  0.1732 0.0391 0.1859
            outputs. Training curves were developed for each model   Dummy  0.5447  0.5700 0.0000  0.0000  0.0000 0.0000 0.0000
            comparing training (70% of the data) versus validation   SVM  0.5107  0.5700 0.5285  0.0000  0.0000 0.0000 0.0000
            (20% of the data). Visualizing these curves indicates if the   QDA  0.4646  0.6525 0.9980  0.4599  0.6295 0.0154 0.0263
            model is learning effectively or if adjustments are needed.
                                                               Abbreviations: ADA: Ada boost classifier; CNN: Convolutional neural
              Heatmaps were  created from  confusion  matrices  to   network; DNN: Dense Neural Network; DT: Decision tree classifier;
            identify patterns of misclassification. For each model, two   EEG: Electroencephalogram; ET: Extra trees classifier; GBC: Gradient
            confusion matrices were derived – a validation set using   boosting classifier; KNN: K-Nearest neighbors classifier; LDA: Linear
                                                               discriminant analysis; LightGBM: Light gradient boosting machine;
            the 20% of validation data and a test set using 10% of the   LR: Logistic regression; MCC: Matthews correlation coefficient;
            data – representing previously unseen data. The validation   NB: Naive Bayes; QDA: Quadratic discriminant analysis; RF: Random
            matrix helps in understanding how well the model is   forest classifier; Ridge: Ridge classifier: Dummy: Dummy classifier;
            performing during training and determines if overfitting   SVM: Support vector machine (Linear Kernel).
            is occurring. The test matrix provides insight into the
            model’s performance on unseen data. These visual tools   values for validation and test datasets were 0.82 and 0.84,
            facilitated the comparison of model performance and   respectively, suggesting consistent classification capability
            guided refinements to improve predictive accuracy.  across datasets.
                                                                 The CNN achieved an accuracy of 81.94% and an
            3. Results                                         F1 score of 79.80% on the validation set (Table  1),
            The performance of all models was evaluated using   demonstrating superior performance compared to the
            validation (20% of the dataset) and test datasets (10%   DNN. On the unseen test dataset, the CNN maintained
            of the dataset) to assess its predictive performance. The   comparable performance, achieving an accuracy of 81.98%
            validation set comprised data previously seen by the   and an F1 score of 75.90%. AUC values for the validation
            model during training, while the test set contained entirely   and test datasets were 0.87 and 0.86, respectively, indicating
            unseen data. Table 1 presents the performance metrics for   strong classification performance across both datasets.
            the DNN, including accuracy, recall, precision, F1 score,   Multiple models from the PyCaret Library were trained
            Cohen’s kappa, MCC, and AUC. In addition, the DNN and   and analyzed with cross-validation. Of these models,
            CNN were also evaluated on their performance on the test   LightGBM showed the greatest overall performance. Table 1
            datasets using these metrics.                      highlights the performance metrics for LightGBM, which
              On the validation dataset (Table  1), the DNN    achieved an accuracy of 85.90% and an F1 score of 83.97%
            achieved an accuracy of 76.98% with an F1 score of   on the test dataset. The model also attained a high AUC value
            75.83%, indicating robust performance on data it had   of 0.91, reflecting its excellent ability to distinguish between
            previously  encountered.  On  the  unseen  test  dataset,  the   classes. Precision and recall values were 86.79% and 81.56%,
            DNN maintained comparable performance, achieving an   respectively, confirming the model’s balanced performance
            accuracy of 77.48% and an F1 score of 70.59%. The AUC   across both positive and negative classifications. These


            Volume 4 Issue 2 (2025)                        115                               doi: 10.36922/an.7941
   116   117   118   119   120   121   122   123   124   125   126