Page 123 - AN-4-2
P. 123
Advanced Neurology ML for EEG signal recognition
provides a detailed breakdown of the model’s predictions From a clinical perspective, the confusion matrices
by categorizing them into true positives, true negatives, provide critical insights into the reliability of these models
false positives, and false negatives. This breakdown allows for diagnosing epilepsy. False positives (non-epileptic
for a nuanced understanding of the model’s strengths and signals misclassified as epileptic) can lead to unnecessary
weaknesses, particularly in high-stakes applications such as anxiety and potentially harmful interventions, while false
the classification of epileptic and non-epileptic EEG signals. negatives (epileptic signals misclassified as non-epileptic)
In this study, confusion matrices were calculated for three may result in missed diagnoses and delayed treatment.
models – DNN, CNN, and LightGBM – on both validation For example, the DNN model had 46 false positives and
and test datasets. The validation set represents data the 56 false negatives on the validation set, compared to 15
model has encountered during training (20% of the dataset), false positives and 35 false negatives on the test set. The
while the test set consists of unseen data (10% of the dataset). CNN model showed slightly better performance, with
Comparing the performance on these two datasets provides fewer false positives and false negatives on both datasets.
insights into the model’s generalization ability. The LightGBM model, while achieving higher validation
accuracy, had a higher number of false positives and
The purpose of calculating confusion matrices on
both validation and test sets is to assess how well the false negatives on the test set, raising concerns about its
reliability in clinical settings.
model performs on data it has seen during training versus
completely unseen data. A small decrease in accuracy When comparing the models, the CNN appears to strike
(calculated as the sum of true positives and true negatives the best balance between validation and test performance,
divided by the total number of predictions, which with minimal overfitting and fewer clinically significant
represents the proportion of correctly classified instances errors. The DNN model also performs well but shows
out of all instances) from the validation set to the test set slightly higher false negatives, which could be problematic
is expected and indicates that the model has generalized in a clinical context. The LightGBM model, despite its high
well. However, a significant drop in performance validation accuracy, demonstrates a larger performance
suggests overfitting, where the model has memorized gap between validation and test sets, indicating overfitting
patterns specific to the training data rather than learning and reduced generalizability. These findings underscore
generalizable features. For instance, in the DNN model the importance of evaluating ML models not only on their
(Figure 2), the validation accuracy was 81.6%, while the overall accuracy but also on their ability to minimize false
test accuracy dropped slightly to 80.3%, indicating good positives and false negatives, particularly in applications
generalization. Similarly, the CNN model (Figure 3) where diagnostic accuracy has direct implications for
showed validation and test accuracies of 83.2% and 82.7%, patient care.
respectively, demonstrating robust performance. In
contrast, the LightGBM model (Figure 4) exhibited a more 3.3. ROC curves
pronounced drop, with validation accuracy at 87.2% and The ROC curve is a graphical representation of a model’s
test accuracy at 81.5%, suggesting potential overfitting. diagnostic ability, plotting the TPR (sensitivity) against the
A B
Figure 2. Confusion matrix of DNN performance (%). (A) The performance of the model on the validation data (20%) which the model has previously
encountered. (B) The performance of the model on the test data (10%) which the model has not previously encountered.
Abbreviation: DNN: Dense Neural Network.
Volume 4 Issue 2 (2025) 117 doi: 10.36922/an.7941

