Page 127 - DP-2-3
P. 127
Design+ ML for predicting Alzheimer’s progression
Additionally, the notable recall scores (88%) confirm using all features yielded an accuracy of 88%. However, it
the model’s ability to correctly identify relevant cases in is important to note that the prior study addressed a binary
the dataset. The consistent F1-scores of 88% across both classification problem, whereas our study considered all
RF models further validate their balance between precision three AD-related classes. This difference in classification
and recall, indicating their resilience to class imbalances scope likely accounts for the observed decrease in accuracy.
and capacity to maintain predictive integrity. In contrast, The added complexity of distinguishing among three
the simple and tuned XGBoost models, while showing classes inherently increases the challenge and may reduce
competitive performance, exhibited slightly lower accuracy model performance relative to a binary setting.
scores of 86% and 85%, respectively. This indicates a slight Therefore, while our model’s accuracy may appear
reduction in overall predictive capability compared to the slightly lower, its ability to classify across multiple
RF models. Nevertheless, the XGBoost models maintained classes provides valuable insight into the severity of
comparable precision and recall scores—approximately AD. Furthermore, in our study, the train–test split was
87% and 86%, respectively—demonstrating a consistent performed prior to preprocessing, supporting the model’s
ability to minimize false positives and accurately detect ability to generalize to unseen data. In contrast, the
relevant cases. Despite this slight decrement in accuracy, previous study preprocessed the entire dataset, except for
the F1-scores of 87% (simple XGBoost) and 85% (tuned SMOTE, which may have contributed to their enhanced
XGBoost) indicate a well-maintained balance between performance. Nonetheless, both studies consistently
precision and recall, affirming their reliability to sustain identified CDGLOBAL (CDR), MMSCORE (MMSE
predictive accuracy across multiple evaluation metrics. score), LIMMTOTAL (logical memory immediate recall),
Both the simple RF and simple XGBoost models, and LDELTOTAL (logical memory delayed recall) as the
utilizing selected variables, exhibited comparable most informative predictors.
accuracies of 86% and 85%, respectively, suggesting similar The comparison across classifiers based on medical
predictive performance. However, upon tuning, the RF history, neuropsychological assessment, and blood analysis
model demonstrated a notable improvement, achieving with ApoE genotype variables offered valuable insights for
an impressive accuracy of 90% and outperforming the medical diagnostics and predictive modeling. Initially,
tuned XGBoost model, which attained a respectable score the classifier utilizing neuropsychological assessment
of 89%. This enhancement underscores the effectiveness variables emerged as the top performer, displaying
of fine-tuning in optimizing the RF algorithm’s predictive impressive accuracy, precision, recall, and F1-score
capabilities, potentially making it a preferred choice in metrics, all exceeding 90%. This underscores the robust
scenarios where maximizing prediction accuracy is crucial. predictive capability of neuropsychological assessment
Additionally, evaluating precision, recall, and F1-score data, highlighting its potential as a crucial diagnostic tool
metrics provided a more comprehensive understanding for AD. However, the classifier relying on medical history
of model performance beyond overall accuracy. Both the variables exhibited substantially lower performance
simple and tuned RF models consistently achieved higher metrics, with accuracy, precision, recall, and F1 scores
precision, recall, and F1 scores compared to their XGBoost hovering around 52%. This indicates its limited predictive
counterparts. Specifically, the tuned RF model yielded accuracy when used in isolation. Despite its relatively lower
the highest scores across all three metrics, indicating accuracy, the classifier based on blood analysis and ApoE
superior ability to minimize false positives while effectively genotype variables demonstrated notable improvement.
capturing relevant instances from the dataset. While the With precision at 85% and recall at 68%, resulting in an
XGBoost models also demonstrated good precision, recall, F1-score of 66%, the classifier shows promise in enhancing
and F1 scores, they slightly underperformed relative predictive accuracy and diagnostic capabilities by
to the RF models, suggesting a moderate reduction in incorporating blood analysis and genetic data.
their effectiveness at minimizing misclassifications and Both the existing and the present study identified that
accurately detecting relevant cases. the classifier based on neuropsychological assessment
The predictive simple RF model from a previous variables as the most effective, consistently demonstrating
study achieved an impressive accuracy of 96.05% for a exceptional performance metrics. Palmqvist underscored
26
binary classification task using all features of the AIBL the significance of the MMSE score in predicting the
non-imaging dataset. In comparison, our best model— transition from MCI to AD. Similarly, Bloch and Friedrich
3
27
the tuned RF model using selected features—achieved concluded that cognitive test results, including MMSE
a slightly lower accuracy of 90%. When comparing and CDR values, were the most informative features for
equivalent models from both studies, our simple RF model effectively classifying AD. These findings highlight the
Volume 2 Issue 3 (2025) 9 doi: 10.36922/DP025270031

