Page 114 - AIH-1-4
P. 114

Artificial Intelligence in Health                        Complex early diagnosis of MS through machine learning



            (CDMS), a diagnosis of MS based on clinical criteria, such   use of ML models or generative artificial intelligence (AI)
            as the McDonald criteria. 12-14                    platforms helped to speed up and facilitate the diagnosis of
              Predicting this progression is difficult due to the   СDMS compared to the real-life clinical timeline. 42
                                                                                      43
            disease’s heterogeneous nature and differences in lesion   Recently, Rasouli  et al.  utilized extreme gradient
            appearance and progression on magnetic resonance   boosting (XGBoost) on a public dataset of 273  patients
            imaging (MRI) scans. MS manifests through a wide   to predict CDMS, achieving an impressive area under
            range of symptoms, including visual, sensory, and motor   the curve (AUC) of 0.918. They identified key predictive
            dysfunctions, as well as cognitive impairments, with the   features using the SHapley Additive exPlanations (SHAP)
                                                                                                            44
            severity of these symptoms varying significantly between   library. Building on this work, our study explores the
            individuals. 15-17  The disease can present in different   same dataset with six advanced ML models: Categorical
            forms – such as relapsing-remitting, primary progressive,   Boosting (CatBoost),  XGBoost,  light gradient boosting
                                                                                45
                                                                                         46
                                                                                                48
                                                                              47
            secondary progressive, and progressive relapsing – each   machine (LGBM),  random forest (RF),  support vector
                                                                             49
            with  distinct  symptoms,  presentations,  and  progression   machine (SVM),  and logistic regression (LR).  Each
                                                                                                       50
            patterns, making prediction even more complex.     model undergoes training with five-fold cross-validation.
                                                                                                            51
                                                         18
            Moreover, the progression of MS is unpredictable; some   To determine feature importance for predictions, we apply
            patients experience a slow decline over many years, while   SHAP across the five folds of these ML models.
            others deteriorate rapidly. The characteristics of MS lesions,   Our study makes notable contributions to the field:
            including their number, size, and location, also differ   (1)  Enhanced model performance: Our CatBoost model
            greatly between patients, contributing to the variability in   achieves an AUC of 0.9312, demonstrating superior
            symptoms and disease progression. The progression and   predictive accuracy. The XGBoost model also
            severity of MS vary significantly among individuals, but   performs well with an AUC of 0.9202, which is slightly
            in the absence of timely diagnosis and treatment, patients   higher than that reported in a recent study. 43
            face severe disability. Early intervention in MS is crucial,   (2)  Key feature identification: We identify the most
            as it can significantly delay the progression of disability   influential features contributing to CDMS prediction.
            and improve long-term outcomes for patients. Timely   We also observe that MRI-based features are universally
            treatment not only reduces the frequency of relapses but   critical, while symptom-related and schooling
            also helps in preserving neurological function, leading to a   features vary in importance, suggesting unique model
            better overall quality of life for individuals with MS. 19  strengths and socioeconomic implications.

              There  have  been  studies  using  statistical  analysis  to   (3)  Comprehensive feature analysis: We conduct an
            study the progression of CIS to CDMS. 20-24  Some studies   in-depth analysis of feature interactions and the
            specifically utilized MRI data to predict the diagnosis   impact of initial symptoms on the progression to
            of CDMS evolved from CIS. 25-30  Diminished sense of   CDMS, revealing new patterns and relationships that
            vibration and proprioception, spinal cord MRI lesions   enhance our understanding of the disease.
            were found among CIS patients that later developed MS.
                                                         31
            The role of viral infections in CIS and their potential   2. Methodology
            trigger mechanism in MS remains controversial; a direct   Figure 1 summarizes our workflow, which involves several
            relationship has been found between the history of   critical  steps  to  ensure  robust  model  evaluation  and
            infectious mononucleosis due to the Epstein–Barr virus   insightful feature analysis. After preprocessing, we split the
            and an increased risk of developing CDMS.  Notably,   data into five folds for cross-validation. This ensures that
                                                  32
            machine learning (ML) has emerged as a crucial tool in   each fold was used as a test set once, and the remaining
            this predictive process, analyzing clinical data to identify   folds were used for training. We then proceeded to train
            patterns and risk factors associated with the progression   six ML models: CatBoost, XGBoost, LGBM, RF, SVM, and
            to CDMS. 33-41  However, most research relies on private   LR. We employed Optuna, a hyperparameter optimization
            datasets that are not accessible to external researchers,   framework, to identify the optimal hyperparameters for
            making it difficult to reproduce results and establish   each model. For each of the five folds, we trained all six
            benchmarks for developing ML algorithms for CDMS   models on four folds and tested them on the remaining
            prediction. The existing literature still lacks a specific set   fold. We repeated this process for each fold, resulting in 30
            of features that can accurately predict the progression of   trained models, with six models per fold. During this stage,
            CIS to СDMS. Furthermore, there is currently no single,   we obtained predictions for each test fold and concatenate
            generally accepted method for predicting the progression   predictions  of  all  folds for  each model.  Then, we
            of CIS to СDMS. However, it has been shown that the   calculated multiple evaluation metrics to compare model


            Volume 1 Issue 4 (2024)                        108                               doi: 10.36922/aih.4255
   109   110   111   112   113   114   115   116   117   118   119