Page 75 - AIH-2-4
P. 75
Artificial Intelligence in Health Synthetic data for obesity level prediction
Figure 29. Performance metrics plots of the five most successful classifiers on the tabular variational autoencoder dataset (using height and weight
attributes)
meal patterns are associated with higher obesity risk, and to predictions, which is consistent with these clinical
our synthetic augmentation appeared to capture these findings.
signals effectively for the ML models. Our comparison highlights practical considerations
Moreover, these findings align with recent nutrition for applying generative data methods in health. Consistent
7
11
research. Colonnello et al. found that dysfunctional with Hernadez et al., we found that SMOTE-type
eating behaviors (e.g., night eating) are correlated with oversampling and VAE-based generation can effectively
lipid and metabolic abnormalities; we note that such balance and expand tabular health data. The poorer
7
behaviors are indirectly represented in our features performance of CTGAN (in the no-BMI case) suggests
(e.g., meal frequency, alcohol use). 11 El-Sehrawy that GAN-based approaches may require more data or
et al. reported that elevated TyG index values and tuning to capture complex categorical relationships in this
12
disordered eating often co-occur in individuals with dataset. Importantly, synthetic data offer benefits beyond
12
8
obesity, suggesting metabolic–diet linkages. In our model accuracy. Arora and Arora emphasize that fully
models, features related to eating patterns (e.g., intake anonymized synthetic patient data can “replace the use
of high-calorie foods, frequency of snacks) contribute of real patient data in certain contexts.” In our work, all
Volume 2 Issue 4 (2025) 69 doi: 10.36922/AIH025140027

