Page 58 - AIH-2-4
P. 58

Artificial Intelligence in Health                                   Synthetic data for obesity level prediction



              In a seminal study, Kaur  et al.  investigated the   notable as one of the earliest applications of ML algorithms
                                           24
            application  of  ML  algorithms  for  obesity  risk  prediction   to the “Obesity Levels” dataset. The researchers employed
            and meal planning. Using the UCI obesity dataset, the   computational intelligence techniques – potentially
            researchers applied six ML algorithms – GB, Bagging meta-  including neural networks or fuzzy systems – to estimate
            estimator, XGBoost, RF, SVM, and k-NN – to predict adult   obesity levels. This pioneering work catalyzed broader
            obesity risk. The models were evaluated under various train-  interest in the dataset, contributing to the establishment
            test split ratios (90/10, 80/20, 70/30, etc.), with ensemble   of baseline results and illustrating the feasibility of obesity
            methods consistently demonstrating superior performance.   classification through ML methods.
            Notably, XGBoost achieved an accuracy of up to 97.79%   Ganie et al.  explored the efficacy of ensemble learning
                                                                          28
            at the 70:30 split, followed closely by GB at ~97.16%. In   techniques  for  predicting  obesity  risk  using  a  publicly
            contrast, simpler models such as k-NN and SVM showed   available Kaggle dataset focused on lifestyle behaviors.
            lower accuracy, ranging from 82% to 87%. The study also   The study applied various ensemble learning methods,
            featured a diet recommendation component generated   including RF, extra trees, XGBoost, and CatBoost, using
            based on the model’s predictions, demonstrating a practical   both bagging and boosting strategies. Among these,
            integration of ML with personalized dietary guidance.   XGBoost delivered the highest performance, achieving
            This early work established the reliability of ML models   an accuracy of 98.1% and an F1-score of 96.5%. The
            – particularly boosting ensembles – in predicting obesity-  findings demonstrate the robustness of ensemble models,
            related outcomes, with the reported accuracy of XGBoost   particularly boosting techniques, in deriving predictive
            (~97.8%) serving as a benchmark in subsequent literature.  insights from multi-dimensional lifestyle datasets.
              Muliawan  et al.  focused on leveraging only eating   Nagarajan et al.  performed a comparative analysis of
                            25
                                                                              29
            habit features for obesity risk prediction, employing an RF   several ML and deep learning models for predicting obesity
            classifier. The study utilized an open-access version of the   levels using a real-world dataset with 17 features, including
            17-feature obesity dataset obtained from Kaggle, placing   demographic and health-related variables. To improve
            emphasis on dietary variables (e.g., frequency of high-calorie   model performance on imbalanced classes, the authors
            food consumption and meal frequency) while deliberately   implemented  SMOTE.  The  algorithms  tested  included
            minimizing reliance on physical measurements. The RF   TabNet, XGBoost, GB, MLP, and RF. The GB algorithm
            model achieved an accuracy of 81.76% in distinguishing   achieved the highest accuracy of 99.3%, with XGBoost and
            between high-risk and low-risk individuals. Although this   TabNet following closely at 99% and 98.4%, respectively,
            performance is lower than that of models incorporating   validating the effectiveness of ensemble and deep learning
            both dietary and physical attributes, it underscores the   models in healthcare data analysis.
            critical  role  of  physical  features  in  achieving  optimal   30
            predictive accuracy. Nonetheless, the findings demonstrate   Umoh  et al.  focused on optimizing various ML
            that food intake patterns alone can yield approximately   classifiers to estimate obesity levels from physical activity
            82% accuracy, emphasizing the potential of ML algorithms   and dietary data obtained through structured surveys.
            in healthcare-related applications. The authors conclude   The dataset underwent thorough preprocessing, including
                                                               normalization and feature selection. The study evaluated
            that RF can serve as an effective screening tool in scenarios   a range of classifiers, including SVM, GB, DT, and others.
            where detailed anthropometric data are unavailable.
                                                               Among them, GB emerged as the top-performing model,
              Choudhuri  et al.   proposed  a  hybrid  ML  model  for   achieving an accuracy of 97.23%. This research highlighted
                            26
            obesity level estimation, utilizing the UCI obesity dataset.   the significance of integrating robust feature selection with
            While the paper does not report specific performance   classifier tuning for effective obesity level prediction.
            metrics,  the  term  “hybrid”  suggests  a  combination  of   Vairachilai et al.  applied the protein intake prediction
                                                                              31
            classification and optimization techniques. Subsequent   and response (PIPR) ML model to analyze the impact
            studies  have  cited  this  work  as  an  early  example  of   of dietary behavior on obesity during the COVID-19
            integrating multiple classifiers to enhance prediction   pandemic. The dataset included comprehensive lifestyle
            accuracy. This study is considered foundational in the   and nutritional behavior indicators. Multiple ensemble
            adoption of ensemble and hybrid approaches within   learning algorithms, such as RF and extra trees, were
            obesity prediction research. It paved the way for later   evaluated in the study. The PIPR model stood out with an
            works – such as that of Helforoush and Sayyad – which   accuracy of 96.7%, demonstrating its capability to capture
                                                  15
            further developed and refined these strategies.    nuanced relationships between protein intake and obesity
              In a related vein, the study by Cervantes and Palacio,    risk and confirming the value of ensemble strategies in
                                                         27
            published in Informatics in Medicine Unlocked in 2020, is   obesity prediction tasks.

            Volume 2 Issue 4 (2025)                         52                          doi: 10.36922/AIH025140027
   53   54   55   56   57   58   59   60   61   62   63