Page 127 - DP-2-3
P. 127

Design+                                                             ML for predicting Alzheimer’s progression



              Additionally, the notable recall scores (88%) confirm   using all features yielded an accuracy of 88%. However, it
            the model’s ability to correctly identify relevant cases in   is important to note that the prior study addressed a binary
            the dataset. The consistent F1-scores of 88% across both   classification problem, whereas our study considered all
            RF models further validate their balance between precision   three AD-related classes. This difference in classification
            and recall, indicating their resilience to class imbalances   scope likely accounts for the observed decrease in accuracy.
            and capacity to maintain predictive integrity. In contrast,   The added complexity of distinguishing among three
            the simple and tuned XGBoost models, while showing   classes inherently increases the challenge and may reduce
            competitive performance, exhibited slightly lower accuracy   model performance relative to a binary setting.
            scores of 86% and 85%, respectively. This indicates a slight   Therefore, while our model’s accuracy may appear
            reduction in overall predictive capability compared to the   slightly  lower,  its  ability  to  classify  across  multiple
            RF models. Nevertheless, the XGBoost models maintained   classes provides valuable insight into the severity of
            comparable precision and recall scores—approximately   AD. Furthermore, in our study, the train–test split was
            87% and 86%, respectively—demonstrating a consistent   performed prior to preprocessing, supporting the model’s
            ability to minimize false positives and accurately detect   ability to generalize to unseen data. In contrast, the
            relevant cases. Despite this slight decrement in accuracy,   previous study preprocessed the entire dataset, except for
            the F1-scores of 87% (simple XGBoost) and 85% (tuned   SMOTE, which may have contributed to their enhanced
            XGBoost) indicate a well-maintained balance between   performance. Nonetheless, both studies consistently
            precision and recall, affirming their reliability to sustain   identified CDGLOBAL (CDR), MMSCORE (MMSE
            predictive accuracy across multiple evaluation metrics.  score), LIMMTOTAL (logical memory immediate recall),
              Both  the  simple  RF  and  simple  XGBoost  models,   and LDELTOTAL (logical memory delayed recall) as the
            utilizing selected variables, exhibited comparable   most informative predictors.
            accuracies of 86% and 85%, respectively, suggesting similar   The comparison across classifiers based on medical
            predictive performance. However, upon tuning, the RF   history, neuropsychological assessment, and blood analysis
            model  demonstrated  a  notable  improvement,  achieving   with ApoE genotype variables offered valuable insights for
            an impressive accuracy of 90% and outperforming the   medical diagnostics and predictive modeling. Initially,
            tuned XGBoost model, which attained a respectable score   the classifier utilizing neuropsychological assessment
            of 89%. This enhancement underscores the effectiveness   variables emerged as the top performer, displaying
            of fine-tuning in optimizing the RF algorithm’s predictive   impressive accuracy, precision, recall, and F1-score
            capabilities, potentially making it a preferred choice in   metrics, all exceeding 90%. This underscores the robust
            scenarios where maximizing prediction accuracy is crucial.  predictive capability of neuropsychological assessment
              Additionally, evaluating precision, recall, and F1-score   data, highlighting its potential as a crucial diagnostic tool
            metrics provided a more comprehensive understanding   for AD. However, the classifier relying on medical history
            of model performance beyond overall accuracy. Both the   variables exhibited substantially lower performance
            simple and tuned RF models consistently achieved higher   metrics, with accuracy, precision, recall, and F1 scores
            precision, recall, and F1 scores compared to their XGBoost   hovering around 52%. This indicates its limited predictive
            counterparts. Specifically, the tuned RF model yielded   accuracy when used in isolation. Despite its relatively lower
            the highest scores across all three metrics, indicating   accuracy, the classifier based on blood analysis and ApoE
            superior ability to minimize false positives while effectively   genotype variables demonstrated notable improvement.
            capturing relevant instances from the dataset. While the   With precision at 85% and recall at 68%, resulting in an
            XGBoost models also demonstrated good precision, recall,   F1-score of 66%, the classifier shows promise in enhancing
            and F1 scores, they slightly underperformed relative   predictive accuracy and diagnostic capabilities by
            to  the RF  models, suggesting  a moderate  reduction  in   incorporating blood analysis and genetic data.
            their  effectiveness  at  minimizing  misclassifications  and   Both the existing and the present study identified that
            accurately detecting relevant cases.               the classifier based on neuropsychological assessment
              The predictive simple RF model from a previous   variables as the most effective, consistently demonstrating
            study achieved an impressive accuracy of 96.05% for a   exceptional performance metrics. Palmqvist  underscored
                                                                                                  26
            binary classification task using all features of the AIBL   the significance of the MMSE score in predicting the
            non-imaging dataset.  In comparison, our best model—  transition from MCI to AD. Similarly, Bloch and Friedrich
                             3
                                                                                                            27
            the tuned RF model using selected features—achieved   concluded that cognitive test results, including MMSE
            a  slightly  lower  accuracy  of  90%.  When  comparing   and CDR values, were the most informative features for
            equivalent models from both studies, our simple RF model   effectively classifying AD. These findings highlight the

            Volume 2 Issue 3 (2025)                         9                            doi: 10.36922/DP025270031
   122   123   124   125   126   127   128   129   130   131   132