Page 54 - AIH-2-4
P. 54
Artificial Intelligence in Health Synthetic data for obesity level prediction
health issues such as hypertension, diabetes, certain types be maintained when key features (height and weight) are
of cancer, and musculoskeletal disorders. Globalization unavailable.
2
has transformed obesity into a global public health A review of the literature reveals that synthetic data
challenge, demanding attention and coordinated action generation is widely applied, with the synthetic minority
in the international policy arena. Obesity is a major, yet oversampling technique (SMOTE) being one of the most
3
preventable, global health condition, with a high and rising commonly used approaches. 14-17 In line with this, the
prevalence among children and adolescents, leading to present study analyzes the Estimation of Obesity Levels
serious health complications and substantial healthcare Based on Eating Habits and Physical Status (EOL) dataset,
costs. In light of these circumstances, early diagnosis which suffers from an unbalanced class distribution.
4
of obesity becomes critically important. By analyzing To address the issue of limited sample size, various
individual characteristics, it is possible to predict an synthetic data generation techniques were employed. 18,19
individual’s risk of developing obesity. Furthermore, an AI system was developed using machine
Artificial intelligence (AI) methodologies hold great learning (ML) algorithms to estimate obesity levels based
promise for automating obesity risk estimation by enabling on individuals’ eating habits and physical status.
early diagnosis and timely intervention. For example, The performance of ML models trained on data
5
predictive models based on dietary and lifestyle features generated using different techniques – namely variational
can identify individuals at elevated risk even before clinical autoencoders (VAE), generative adversarial network
obesity manifests. However, real-world datasets (especially (GAN), and SMOTE – nominal and continuous
survey-based ones) are often small, imbalanced, or contain (SMOTE-NC)—was compared. The next section of this
missing values. Synthetic data generation provides a means paper presents a summary of related work, including the
to overcome these challenges by creating artificial records datasets used, methodologies applied, and results reported.
that replicate the statistical properties of real data. This The Materials and Methods section describes the dataset
6,7
can improve model training and generalization without characteristics, synthetic data generation approaches,
compromising patient privacy. 8 preprocessing procedures, and interrelationships among
Recent evidence underscores the multifactorial nature attributes. It also details the ML algorithms and evaluation
of obesity. For instance, early-life nutrition and feeding methodology used to assess model effectiveness. The
practices have long-term effects on weight trajectories: Results and Discussion section presents model outputs
exclusive breastfeeding is associated with a lower risk of through various graphs and tables. Finally, the manuscript
9
childhood overweight and obesity. Studies of adult obesity concludes with a summary and suggestions for future
also emphasize the role of dietary patterns and psychosocial research.
factors. Sobas et al. identified distinct dietary patterns 2. Literature review
10
(“prudent” healthy diet versus “fast food & alcohol”)
among bariatric surgery candidates, with the latter linked Numerous studies in the literature have addressed the
to more severe obesity. Colonnello et al. found that problem of obesity detection, with particular emphasis on
11
10
dysfunctional eating behaviors (e.g., night eating, food dataset construction and the development of ML models.
cravings) correlate with lipid and metabolic abnormalities Table 1 summarizes the key characteristics of the datasets
12
11
in obese patients. El-Sehrawy et al. showed that a high used in these studies, the ML techniques applied, and the
triglyceride-glucose (TyG) index (a marker of insulin corresponding performance metrics reported.
resistance) is associated with adverse lipid profiles and Palechor et al. developed a dataset for obesity level
14
disordered eating in obesity. Psychological stress is also classification using data collected from individuals in
12
implicated – Kuckuck et al. demonstrated that long- Mexico, Peru, and Colombia. The dataset comprises 17
13
term stress (measured by hair cortisone) is associated with attributes related to eating habits and physical condition.
hedonic eating tendencies in obese individuals. Together, Of the 2,111 instances, 23% were collected directly from
13
these studies highlight that beyond anthropometric users via a web platform, while the remaining 77% were
measures, a combination of diet quality, metabolic markers, synthetically generated using SMOTE. Classification of
and behavioral patterns influence obesity outcomes. This individuals was based on their body mass index (BMI)
study does not attempt to discover new causal factors; values. Subjects with a BMI below 18.5 were categorized
rather, it focuses on the methodological contribution of as underweight, those with values between 18.5 and 24.9
using synthetic data to improve obesity prediction models as normal weight, and those between 25 and 29.9 as
based on available dietary/behavioral features. Specifically, overweight. BMI values of 30 and above indicated obesity,
it examines the extent to which predictive accuracy can which was further divided into three classes: 30–34.9 as
Volume 2 Issue 4 (2025) 48 doi: 10.36922/AIH025140027

