Page 94 - AIH-2-1
P. 94

Artificial Intelligence in Health                          Benchmarking ML imputation in mental health surveys



            RMSE (standard deviation = 0.056) in the BSMR scenario   patterns in the real data when participants skip an entire
            than in the SMR scenario (standard deviation = 0.0043).   survey in SPARK.
            For the BSMR scenario, KNN and MIDAS performed the   As shown in Figure 5, MIDAS and KNN not only had
            best with an average RMSE of 0.96), outperforming the   similar overall error rates but also exhibited comparable
            other methods especially when the missing rate was low   imputation times of around 10 – 13 min. MissForest had a
            (Figure 3, left panel). The variability of the RMSE was also   median imputation time of slightly <30 min. On the other
            relatively low for both methods, with a standard deviation   hand, MICE had a median imputation time of around
            of 0.0066 for KNN and 1e-6 for MIDAS. MICE performed   285 min, which was significantly larger than those of the
            worse than the other imputation methods in both SMR   remaining models. The difference in computational time
            and BSMR scenarios. Especially in the BSMR scenario,   between implementations in R and Python is negligible. 26
            the RMSE value was significantly higher at 2.64 with a
            relatively large standard deviation of 0.098.      4. Discussion
              For every simulation scenario, the difference in   The establishment of biobank databases has enabled the
            imputation performance on overall RMSE between     collection of self-reported mental and behavioral surveys
            KNN and MIDAS was marginal. Both models produced   at scale.  SPARK has gathered social and behavioral
                                                                      1-3
            very similar results throughout the experiment and for   survey  data  from  about  100,000  individuals   and  there
                                                                                                    1
            each simulation scenario besides BSMR, they typically   is ongoing collection of more survey data on existing
            performed slightly worse than MissForest.          participants. UK Biobank has measurements on lifetime
            3.4. Performance of imputation on mental and       depressive disorder, cognitive function, attention, and
            behavioral summary scores                          impulsivity from about 150,000 participants. 2,27,28  All of Us
                                                               also has strategic plans to collect mental and behavioral
            For every simulation scenario, the mean and standard   surveys at scale.  However, the data quality and statistical
                                                                            3
            deviations of RMSE values for the SCQ, RBS-R, and DCDQ   power are compromised by missing data. Recent advances
            scores were computed across the 10 trials as displayed in   in machine learning methods have inspired novel missing
            Figure 4. The relative performance of the four models was   data imputation approaches with increased accuracy and
            generally consistent across the three summary scores.  computational efficiency. 13-16  Previous studies either have
              In the MCAR scenario, MissForest consistently    not reviewed these newly developed imputation methods
            outperformed KNN and MIDAS when imputing all three   or have not focused on assessing imputation accuracy
            summary scores. The MICE model exhibited a steep incline   in mental and behavioral surveys that exhibit blockwise
            in error as the missing rate was incremented. It performed   missing structures. 18-22
            the best until the missing rate was increased to 50%, after   Our study provided insights on the missingness pattern
            which it was surpassed by the remaining models. MICE is   in SPARK, a large-scale cohort with autism, and assessed
            ideal for lower rates of random missingness but begins to   the imputation accuracy and computational time of four
            perform exponentially worse as the rate gets larger. In fact,   popular missing data imputation methods—MICE, KNN,
            the MICE model produced the largest RMSE among the   MissForest, and MIDAS. This was done by simulating
            four methods at a 90% missing rate. For missing rates that   three missingness scenarios in mental and behavioral
            are 50% and above, MissForest is the ideal model since it   surveys, including SCQ, RBS-R, and DCDQ. We observed
            had the lowest errors among the four methods.      that 50 – 70% of participants with autism did not complete
              The MissForest model performed the best in the SMR   SCQ, RBS-R, and DCDQ surveys and the dataset exhibited
            scenario. However, each method, especially MICE and   blockwise missing structures. The missing rates also varied
            MissForest, exhibited error rates that rose sharply when   by sex, age, and race. Overall, KNN and MIDAS showed
            the missing values became blocked by survey type in the   relatively stable performance with increasing missing rate
            BSMR scenario. In the BSMR scenario, KNN and MIDAS   in the MCAR scenario and slightly higher imputation error
            exhibited the lowest error rates with MissForest performing   when blockwise missingness is introduced in the MNAR
            slightly worse. MICE performed considerably worse than   scenarios. The error rate increased more significantly in
            the remaining models in the BSMR scenario.         MICE and MissForest in both MCAR and MNAR scenarios,
                                                               with a particularly notable surge in error rate for MICE
            3.5. Computational time                            when blockwise missing structures were introduced. When
            When comparing the computational times of the four   imputing SCQ, RBS-R, and DCDQ summary scores in the
            models, the BSMR simulation scenario was used since   MCAR scenario, MICE had the lowest error rate when
            this environment most closely resembles the missingness   the missing rate was low, while MissForest had the lowest


            Volume 2 Issue 1 (2025)                         88                               doi: 10.36922/aih.4406
   89   90   91   92   93   94   95   96   97   98   99