Page 54 - AIH-2-4
P. 54

Artificial Intelligence in Health                                   Synthetic data for obesity level prediction



            health issues such as hypertension, diabetes, certain types   be maintained when key features (height and weight) are
            of cancer, and musculoskeletal disorders.  Globalization   unavailable.
                                              2
            has transformed obesity into a global public health   A review of the literature reveals that synthetic data
            challenge, demanding attention and coordinated action   generation is widely applied, with the synthetic minority
            in the international policy arena.  Obesity is a major, yet   oversampling technique (SMOTE) being one of the most
                                       3
            preventable, global health condition, with a high and rising   commonly used approaches. 14-17  In line with this, the
            prevalence among children and adolescents, leading to   present  study  analyzes  the  Estimation  of  Obesity  Levels
            serious health complications and substantial healthcare   Based on Eating Habits and Physical Status (EOL) dataset,
            costs.  In light of these circumstances, early diagnosis   which suffers from an unbalanced class distribution.
                4
            of obesity becomes critically important. By analyzing   To address the issue of limited sample size, various
            individual characteristics, it is possible to predict an   synthetic data generation techniques were employed. 18,19
            individual’s risk of developing obesity.           Furthermore, an AI system was developed using machine
              Artificial intelligence (AI) methodologies hold great   learning (ML) algorithms to estimate obesity levels based
            promise for automating obesity risk estimation by enabling   on individuals’ eating habits and physical status.
            early diagnosis and timely intervention.  For example,   The performance of ML models trained on data
                                              5
            predictive models based on dietary and lifestyle features   generated using different techniques – namely variational
            can identify individuals at elevated risk even before clinical   autoencoders (VAE), generative adversarial network
            obesity manifests. However, real-world datasets (especially   (GAN), and SMOTE – nominal and continuous
            survey-based ones) are often small, imbalanced, or contain   (SMOTE-NC)—was compared. The next section of this
            missing values. Synthetic data generation provides a means   paper presents a summary of related work, including the
            to overcome these challenges by creating artificial records   datasets used, methodologies applied, and results reported.
            that replicate the statistical properties of real data. This   The Materials and Methods section describes the dataset
                                                     6,7
            can improve model training and generalization without   characteristics, synthetic data generation approaches,
            compromising patient privacy. 8                    preprocessing procedures, and interrelationships among
              Recent evidence underscores the multifactorial nature   attributes. It also details the ML algorithms and evaluation
            of obesity. For instance, early-life nutrition and feeding   methodology used to assess model effectiveness. The
            practices have long-term effects on weight trajectories:   Results and Discussion section presents model outputs
            exclusive breastfeeding is associated with a lower risk of   through various graphs and tables. Finally, the manuscript
                                       9
            childhood overweight and obesity.  Studies of adult obesity   concludes with a summary and suggestions for future
            also emphasize the role of dietary patterns and psychosocial   research.
            factors. Sobas  et al.  identified distinct dietary  patterns   2. Literature review
                            10
            (“prudent” healthy diet versus “fast food & alcohol”)
            among bariatric surgery candidates, with the latter linked   Numerous studies in the literature have addressed the
            to more severe obesity. Colonnello  et al.  found that   problem of obesity detection, with particular emphasis on
                                               11
                               10
            dysfunctional eating behaviors (e.g., night eating, food   dataset construction and the development of ML models.
            cravings) correlate with lipid and metabolic abnormalities   Table 1 summarizes the key characteristics of the datasets
                                         12
                          11
            in obese patients. El-Sehrawy et al.  showed that a high   used in these studies, the ML techniques applied, and the
            triglyceride-glucose (TyG) index  (a  marker  of  insulin   corresponding performance metrics reported.
            resistance) is associated with adverse lipid profiles and   Palechor  et al.  developed a dataset for obesity level
                                                                             14
            disordered eating in obesity. Psychological stress is also   classification using data collected from individuals in
                                   12
            implicated – Kuckuck  et  al.  demonstrated that long-  Mexico, Peru, and Colombia. The dataset comprises 17
                                    13
            term stress (measured by hair cortisone) is associated with   attributes related to eating habits and physical condition.
            hedonic eating tendencies in obese individuals. Together,   Of the 2,111 instances, 23% were collected directly from
                                                 13
            these studies highlight that beyond anthropometric   users via a web platform, while the remaining 77% were
            measures, a combination of diet quality, metabolic markers,   synthetically generated using SMOTE. Classification of
            and behavioral patterns influence obesity outcomes. This   individuals was based on their body mass index (BMI)
            study does not attempt to discover new causal factors;   values. Subjects with a BMI below 18.5 were categorized
            rather,  it  focuses  on  the  methodological  contribution  of   as underweight, those with values between 18.5 and 24.9
            using synthetic data to improve obesity prediction models   as normal weight, and those between 25 and 29.9 as
            based on available dietary/behavioral features. Specifically,   overweight. BMI values of 30 and above indicated obesity,
            it examines the extent to which predictive accuracy can   which was further divided into three classes: 30–34.9 as


            Volume 2 Issue 4 (2025)                         48                          doi: 10.36922/AIH025140027
   49   50   51   52   53   54   55   56   57   58   59