Page 94 - AIH-2-1
P. 94
Artificial Intelligence in Health Benchmarking ML imputation in mental health surveys
RMSE (standard deviation = 0.056) in the BSMR scenario patterns in the real data when participants skip an entire
than in the SMR scenario (standard deviation = 0.0043). survey in SPARK.
For the BSMR scenario, KNN and MIDAS performed the As shown in Figure 5, MIDAS and KNN not only had
best with an average RMSE of 0.96), outperforming the similar overall error rates but also exhibited comparable
other methods especially when the missing rate was low imputation times of around 10 – 13 min. MissForest had a
(Figure 3, left panel). The variability of the RMSE was also median imputation time of slightly <30 min. On the other
relatively low for both methods, with a standard deviation hand, MICE had a median imputation time of around
of 0.0066 for KNN and 1e-6 for MIDAS. MICE performed 285 min, which was significantly larger than those of the
worse than the other imputation methods in both SMR remaining models. The difference in computational time
and BSMR scenarios. Especially in the BSMR scenario, between implementations in R and Python is negligible. 26
the RMSE value was significantly higher at 2.64 with a
relatively large standard deviation of 0.098. 4. Discussion
For every simulation scenario, the difference in The establishment of biobank databases has enabled the
imputation performance on overall RMSE between collection of self-reported mental and behavioral surveys
KNN and MIDAS was marginal. Both models produced at scale. SPARK has gathered social and behavioral
1-3
very similar results throughout the experiment and for survey data from about 100,000 individuals and there
1
each simulation scenario besides BSMR, they typically is ongoing collection of more survey data on existing
performed slightly worse than MissForest. participants. UK Biobank has measurements on lifetime
3.4. Performance of imputation on mental and depressive disorder, cognitive function, attention, and
behavioral summary scores impulsivity from about 150,000 participants. 2,27,28 All of Us
also has strategic plans to collect mental and behavioral
For every simulation scenario, the mean and standard surveys at scale. However, the data quality and statistical
3
deviations of RMSE values for the SCQ, RBS-R, and DCDQ power are compromised by missing data. Recent advances
scores were computed across the 10 trials as displayed in in machine learning methods have inspired novel missing
Figure 4. The relative performance of the four models was data imputation approaches with increased accuracy and
generally consistent across the three summary scores. computational efficiency. 13-16 Previous studies either have
In the MCAR scenario, MissForest consistently not reviewed these newly developed imputation methods
outperformed KNN and MIDAS when imputing all three or have not focused on assessing imputation accuracy
summary scores. The MICE model exhibited a steep incline in mental and behavioral surveys that exhibit blockwise
in error as the missing rate was incremented. It performed missing structures. 18-22
the best until the missing rate was increased to 50%, after Our study provided insights on the missingness pattern
which it was surpassed by the remaining models. MICE is in SPARK, a large-scale cohort with autism, and assessed
ideal for lower rates of random missingness but begins to the imputation accuracy and computational time of four
perform exponentially worse as the rate gets larger. In fact, popular missing data imputation methods—MICE, KNN,
the MICE model produced the largest RMSE among the MissForest, and MIDAS. This was done by simulating
four methods at a 90% missing rate. For missing rates that three missingness scenarios in mental and behavioral
are 50% and above, MissForest is the ideal model since it surveys, including SCQ, RBS-R, and DCDQ. We observed
had the lowest errors among the four methods. that 50 – 70% of participants with autism did not complete
The MissForest model performed the best in the SMR SCQ, RBS-R, and DCDQ surveys and the dataset exhibited
scenario. However, each method, especially MICE and blockwise missing structures. The missing rates also varied
MissForest, exhibited error rates that rose sharply when by sex, age, and race. Overall, KNN and MIDAS showed
the missing values became blocked by survey type in the relatively stable performance with increasing missing rate
BSMR scenario. In the BSMR scenario, KNN and MIDAS in the MCAR scenario and slightly higher imputation error
exhibited the lowest error rates with MissForest performing when blockwise missingness is introduced in the MNAR
slightly worse. MICE performed considerably worse than scenarios. The error rate increased more significantly in
the remaining models in the BSMR scenario. MICE and MissForest in both MCAR and MNAR scenarios,
with a particularly notable surge in error rate for MICE
3.5. Computational time when blockwise missing structures were introduced. When
When comparing the computational times of the four imputing SCQ, RBS-R, and DCDQ summary scores in the
models, the BSMR simulation scenario was used since MCAR scenario, MICE had the lowest error rate when
this environment most closely resembles the missingness the missing rate was low, while MissForest had the lowest
Volume 2 Issue 1 (2025) 88 doi: 10.36922/aih.4406

