Page 114 - AIH-1-4
P. 114
Artificial Intelligence in Health Complex early diagnosis of MS through machine learning
(CDMS), a diagnosis of MS based on clinical criteria, such use of ML models or generative artificial intelligence (AI)
as the McDonald criteria. 12-14 platforms helped to speed up and facilitate the diagnosis of
Predicting this progression is difficult due to the СDMS compared to the real-life clinical timeline. 42
43
disease’s heterogeneous nature and differences in lesion Recently, Rasouli et al. utilized extreme gradient
appearance and progression on magnetic resonance boosting (XGBoost) on a public dataset of 273 patients
imaging (MRI) scans. MS manifests through a wide to predict CDMS, achieving an impressive area under
range of symptoms, including visual, sensory, and motor the curve (AUC) of 0.918. They identified key predictive
dysfunctions, as well as cognitive impairments, with the features using the SHapley Additive exPlanations (SHAP)
44
severity of these symptoms varying significantly between library. Building on this work, our study explores the
individuals. 15-17 The disease can present in different same dataset with six advanced ML models: Categorical
forms – such as relapsing-remitting, primary progressive, Boosting (CatBoost), XGBoost, light gradient boosting
45
46
48
47
secondary progressive, and progressive relapsing – each machine (LGBM), random forest (RF), support vector
49
with distinct symptoms, presentations, and progression machine (SVM), and logistic regression (LR). Each
50
patterns, making prediction even more complex. model undergoes training with five-fold cross-validation.
51
18
Moreover, the progression of MS is unpredictable; some To determine feature importance for predictions, we apply
patients experience a slow decline over many years, while SHAP across the five folds of these ML models.
others deteriorate rapidly. The characteristics of MS lesions, Our study makes notable contributions to the field:
including their number, size, and location, also differ (1) Enhanced model performance: Our CatBoost model
greatly between patients, contributing to the variability in achieves an AUC of 0.9312, demonstrating superior
symptoms and disease progression. The progression and predictive accuracy. The XGBoost model also
severity of MS vary significantly among individuals, but performs well with an AUC of 0.9202, which is slightly
in the absence of timely diagnosis and treatment, patients higher than that reported in a recent study. 43
face severe disability. Early intervention in MS is crucial, (2) Key feature identification: We identify the most
as it can significantly delay the progression of disability influential features contributing to CDMS prediction.
and improve long-term outcomes for patients. Timely We also observe that MRI-based features are universally
treatment not only reduces the frequency of relapses but critical, while symptom-related and schooling
also helps in preserving neurological function, leading to a features vary in importance, suggesting unique model
better overall quality of life for individuals with MS. 19 strengths and socioeconomic implications.
There have been studies using statistical analysis to (3) Comprehensive feature analysis: We conduct an
study the progression of CIS to CDMS. 20-24 Some studies in-depth analysis of feature interactions and the
specifically utilized MRI data to predict the diagnosis impact of initial symptoms on the progression to
of CDMS evolved from CIS. 25-30 Diminished sense of CDMS, revealing new patterns and relationships that
vibration and proprioception, spinal cord MRI lesions enhance our understanding of the disease.
were found among CIS patients that later developed MS.
31
The role of viral infections in CIS and their potential 2. Methodology
trigger mechanism in MS remains controversial; a direct Figure 1 summarizes our workflow, which involves several
relationship has been found between the history of critical steps to ensure robust model evaluation and
infectious mononucleosis due to the Epstein–Barr virus insightful feature analysis. After preprocessing, we split the
and an increased risk of developing CDMS. Notably, data into five folds for cross-validation. This ensures that
32
machine learning (ML) has emerged as a crucial tool in each fold was used as a test set once, and the remaining
this predictive process, analyzing clinical data to identify folds were used for training. We then proceeded to train
patterns and risk factors associated with the progression six ML models: CatBoost, XGBoost, LGBM, RF, SVM, and
to CDMS. 33-41 However, most research relies on private LR. We employed Optuna, a hyperparameter optimization
datasets that are not accessible to external researchers, framework, to identify the optimal hyperparameters for
making it difficult to reproduce results and establish each model. For each of the five folds, we trained all six
benchmarks for developing ML algorithms for CDMS models on four folds and tested them on the remaining
prediction. The existing literature still lacks a specific set fold. We repeated this process for each fold, resulting in 30
of features that can accurately predict the progression of trained models, with six models per fold. During this stage,
CIS to СDMS. Furthermore, there is currently no single, we obtained predictions for each test fold and concatenate
generally accepted method for predicting the progression predictions of all folds for each model. Then, we
of CIS to СDMS. However, it has been shown that the calculated multiple evaluation metrics to compare model
Volume 1 Issue 4 (2024) 108 doi: 10.36922/aih.4255

