Page 125 - AIH-1-4
P. 125

Artificial Intelligence in Health                        Complex early diagnosis of MS through machine learning



            the upper limbs (arms). A positive ULSSEP indicates an   whether the top influencing features increase or decrease
            abnormality in the sensory pathways, while a negative   the likelihood of CDMS.
            result  means normal function.  The interaction  between
            Symptom_Motor and ULSSEP reveals key neurological   5. Conclusion
            insights. When Symptom_Motor is 1 (motor symptoms   The results of this study demonstrate improvements in
            present), ULSSEP values show a distinct separation,   early diagnosis accuracy and the potential of ML models
            especially for positive responses, indicating abnormal   in clinical integration. Specifically, our tree-based models
            sensory activity. Conversely, with Symptom_Motor at 0   achieve AUC scores above 0.9, with F1 scores higher than
            (no motor symptoms), ULSSEP values are more clustered,   82%, highlighting their effectiveness in predicting CDMS
            showing  fewer  abnormalities.  The  presence  of  motor   from CIS. We also identify key features that significantly
            symptoms enhances the differentiation in ULSSEP results,   contribute to predicting the progression of CIS to CDMS,
            highlighting a strong link between motor and sensory   including  Periventricular_MRI,  Infratentorial_MRI,
            pathways.
                                                               Oligoclonal_Bands, Schooling, and Symptom_Motor.
            4. Discussion                                      These features provide valuable insights into the factors
                                                               most closely associated with MS progression.
            Predicting the progression of CIS to MS remains an
            extremely pressing issue. The use of ML models in clinical   Acknowledgments
            practice will help, together with clinical and radiological   None.
            data, facilitate the early diagnosis of MS. Timely
            administration of therapy for this disease will prevent   Funding
            disability, maintain ability to work, and improve the
            quality of life of patients. In the future, these models can   This work was supported by a grant from the Russian
            be integrated into diagnostic workflows to flag high-risk   Science Foundation (RSF 23-15-00377).
            patients based on their clinical data and medical imaging
            results. They can also continuously analyze patient data   Conflict of interest
            to optimize treatment plans in real time, providing more   The authors declare that they have no competing interests.
            responsive patient management. We believe increasing
            sample size and lengthening the duration of observation,   Authors contributions
            coupled with the utilization of deep learning, and are key   Conceptualization: Bair N. Tuchinov
            to further enhancing the predictive model. Adding more   Formal analysis: Minh Sao Khue Luu
            features  such  as  MRI,  serum,  genetic  biomarkers,  and   Investigation: Minh Sao Khue Luu
            environmental factors can also provide unique insights   Methodology: Denis S. Korobko, Nadezhda A. Malkova
            into different aspects of CDMS progression. In addition,   Project administration: Andrey A. Tulupov
            conducting longitudinal studies is essential to understand   Writing – original draft: Minh Sao Khue Luu, Anna I.
            how CIS develops over time and distinguish between short   Prokaeva
            variations  and  long-term  trends  of  the  disease  process.   Writing – review & editing: Bair N. Tuchinov
            This enables the development of treatment methods that
            tailor to different stages of CDMS.                Ethics approval and consent to participate

              While this study’s findings are promising, there are   Not applicable.
            several limitations to be acknowledged. One key limitation
            is  the  small  dataset  that  causes the  high risk  of  model   Consent for publication
            overfitting, even when cross-validation is applied. This
            is particularly problematic when the dataset comes from   Not applicable.
            a single location and is not representative of a diverse   Availability of data
            population. The retrospective nature of the data, which
            means the data is collected for purposes other than the   The data used in this study are accessible at https://data.
            specific research question at hand, also poses limitations.   mendeley.com/datasets/8wk5hjx7x2/1.  The  code  used
            There may be inconsistencies in how data are recorded,   to implement the models and analyses in this study is
            and this can introduce noise to the models. Moreover,   available at https://github.com/luumsk/CIStoCDMS.git.
            the analysis of features only reveals the magnitude of   This GitHub repository includes detailed documentation
            their importance since SHAP values are based on mean   of the libraries utilized, with all codes necessary to
            absolute values. As a result, it does not provide insights to   reproduce the results, including data preparation, model


            Volume 1 Issue 4 (2024)                        119                               doi: 10.36922/aih.4255
   120   121   122   123   124   125   126   127   128   129   130