Page 125 - AIH-1-4
P. 125
Artificial Intelligence in Health Complex early diagnosis of MS through machine learning
the upper limbs (arms). A positive ULSSEP indicates an whether the top influencing features increase or decrease
abnormality in the sensory pathways, while a negative the likelihood of CDMS.
result means normal function. The interaction between
Symptom_Motor and ULSSEP reveals key neurological 5. Conclusion
insights. When Symptom_Motor is 1 (motor symptoms The results of this study demonstrate improvements in
present), ULSSEP values show a distinct separation, early diagnosis accuracy and the potential of ML models
especially for positive responses, indicating abnormal in clinical integration. Specifically, our tree-based models
sensory activity. Conversely, with Symptom_Motor at 0 achieve AUC scores above 0.9, with F1 scores higher than
(no motor symptoms), ULSSEP values are more clustered, 82%, highlighting their effectiveness in predicting CDMS
showing fewer abnormalities. The presence of motor from CIS. We also identify key features that significantly
symptoms enhances the differentiation in ULSSEP results, contribute to predicting the progression of CIS to CDMS,
highlighting a strong link between motor and sensory including Periventricular_MRI, Infratentorial_MRI,
pathways.
Oligoclonal_Bands, Schooling, and Symptom_Motor.
4. Discussion These features provide valuable insights into the factors
most closely associated with MS progression.
Predicting the progression of CIS to MS remains an
extremely pressing issue. The use of ML models in clinical Acknowledgments
practice will help, together with clinical and radiological None.
data, facilitate the early diagnosis of MS. Timely
administration of therapy for this disease will prevent Funding
disability, maintain ability to work, and improve the
quality of life of patients. In the future, these models can This work was supported by a grant from the Russian
be integrated into diagnostic workflows to flag high-risk Science Foundation (RSF 23-15-00377).
patients based on their clinical data and medical imaging
results. They can also continuously analyze patient data Conflict of interest
to optimize treatment plans in real time, providing more The authors declare that they have no competing interests.
responsive patient management. We believe increasing
sample size and lengthening the duration of observation, Authors contributions
coupled with the utilization of deep learning, and are key Conceptualization: Bair N. Tuchinov
to further enhancing the predictive model. Adding more Formal analysis: Minh Sao Khue Luu
features such as MRI, serum, genetic biomarkers, and Investigation: Minh Sao Khue Luu
environmental factors can also provide unique insights Methodology: Denis S. Korobko, Nadezhda A. Malkova
into different aspects of CDMS progression. In addition, Project administration: Andrey A. Tulupov
conducting longitudinal studies is essential to understand Writing – original draft: Minh Sao Khue Luu, Anna I.
how CIS develops over time and distinguish between short Prokaeva
variations and long-term trends of the disease process. Writing – review & editing: Bair N. Tuchinov
This enables the development of treatment methods that
tailor to different stages of CDMS. Ethics approval and consent to participate
While this study’s findings are promising, there are Not applicable.
several limitations to be acknowledged. One key limitation
is the small dataset that causes the high risk of model Consent for publication
overfitting, even when cross-validation is applied. This
is particularly problematic when the dataset comes from Not applicable.
a single location and is not representative of a diverse Availability of data
population. The retrospective nature of the data, which
means the data is collected for purposes other than the The data used in this study are accessible at https://data.
specific research question at hand, also poses limitations. mendeley.com/datasets/8wk5hjx7x2/1. The code used
There may be inconsistencies in how data are recorded, to implement the models and analyses in this study is
and this can introduce noise to the models. Moreover, available at https://github.com/luumsk/CIStoCDMS.git.
the analysis of features only reveals the magnitude of This GitHub repository includes detailed documentation
their importance since SHAP values are based on mean of the libraries utilized, with all codes necessary to
absolute values. As a result, it does not provide insights to reproduce the results, including data preparation, model
Volume 1 Issue 4 (2024) 119 doi: 10.36922/aih.4255

