Page 59 - AIH-2-4

P. 59

Artificial Intelligence in Health Synthetic data for obesity level prediction

Forte et al. developed a deep learning-based NN medicine. Yang et al. reviewed multiclass oversampling
32
35
model aimed at classifying obesity risks among Portuguese for imbalanced health datasets, noting an emerging trend
®
adolescents. The model used the FITescola dataset, toward hybrid methods combining SMOTE with other
which includes information on physical fitness levels and strategies. While SMOTE-NC (used in our study) is a
35
BMI percentiles. Leveraging the power of deep learning, straightforward approach that interpolates minority-class
specifically convolutional NNs, the study aimed to samples in mixed-type data, more complex generators
improve the detection of obesity risk patterns in youth. like GANs can capture non-linear feature dependencies.
The proposed model achieved a classification accuracy of Synthetic tabular data in health often requires careful
96.3%, showcasing the potential of deep NNs to support evaluation; we leverage standard classification metrics to
early intervention strategies in public health contexts. assess model performance on generated data. 7
Yağın et al. proposed a Bayesian-optimized NN for Recent work on GANs and VAEs shows they can
33
the estimation of obesity levels using a dataset focused on simulate realistic clinical datasets. For instance, standalone
lifestyle factors and eating habits obtained from the UCI reports on conditional tabular GANs (CTGANs) or
ML Repository. The study utilized a feedforward deep VAE variants demonstrate their success in reproducing
NN whose hyperparameters were tuned via Bayesian distributions of complex clinical features. However,
6,7
optimization to maximize predictive accuracy. This empirical comparisons of these methods (VAE versus GAN
optimization improved the network’s ability to identify versus traditional oversampling) in specific applications
significant patterns in the data by fine-tuning parameters like obesity remain limited, which motivates our empirical
such as learning rate and hidden layers. The final model study. In summary, while many studies have achieved high
achieved an accuracy of 96.5%, outperforming earlier accuracy in obesity prediction using ensemble or deep
approaches and demonstrating the effectiveness of learning models, they typically rely on the original data
combining NNs with optimization strategies. (often including BMI-related attributes).
Gözükara Bağ et al. introduced a predictive modeling 3. Materials and methods
34
approach that integrates physical activity and nutritional
habit data for classifying obesity levels. They utilized 3.1. Dataset definition
a dataset comprising 2,111 records from the UCI ML This study utilized the dataset titled Estimation of Obesity
Repository, which included variables such as gender, BMI, Levels Based on Eating Habits and Physical Condition.
5
dietary patterns, and physical activity. The study employed The data were collected from individuals in Mexico, Peru,
ML algorithms, including RF, k-NN, and XGBoost. Feature and Colombia, encompassing information on dietary
scaling and selection techniques were applied to enhance habits, physical conditions, and obesity levels. The dataset
model performance. The highest classification accuracy contains a total of 2,111 instances and 17 attributes. The
of 98.87% was achieved using the XGBoost algorithm, first 498 instances were collected directly from users, while
underscoring its superiority in handling complex lifestyle- the remaining samples were synthetically generated by
related data for obesity classification. Palechor et al. using SMOTE. All analyses and synthetic
14
Several works underscore the impact of diet and lifestyle data generation in this study were conducted using the 498
features on obesity classification. For example, studies using user-collected samples. The features included are gender,
the EOL dataset have identified that eating habits (e.g., age, height, weight, family history of obesity, frequent
frequency of high-calorie food intake, number of meals) consumption of high-calorie foods, frequency of vegetable
and lifestyle choices (e.g., mode of transport, frequency consumption, number of main meals, consumption of
of physical activity) significantly influence obesity level food between meals, smoking, daily water consumption,
predictions. These findings are consistent with nutrition calorie tracking, frequency of physical activity, frequency
research showing that “prudent” diet patterns (rich in of using technological devices, alcohol consumption, type
fruits and vegetables) are linked to lower obesity, whereas of transportation used, and obesity level. It is important to
fast-food – heavy patterns correlate with higher adiposity. 10 note that the dataset contains no missing values. The gender
Obesity is closely tied to metabolic syndrome markers. distribution is shown in Figure 1, with 271 males (54.4%) and
The TyG index study and investigations of oxytocin levels 227 females (45.6%), indicating a relatively balanced sample.
illustrate that blood biomarkers and hormonal factors As illustrated in Figure 2, the data indicate a
are often elevated in obesity and associated with eating predominance of affirmative responses, with 300
behaviors. 12,13 individuals (60.2%) supporting the proposition and 198
In addition to SMOTE, various over-sampling individuals (39.8%) opposing it. The distribution reflects a
techniques have been adapted for multiclass problems in clear majority in favor of the proposition.

Volume 2 Issue 4 (2025) 53 doi: 10.36922/AIH025140027

54 55 56 57 58 59 60 61 62 63 64