Page 55 - AIH-2-4
P. 55

Artificial Intelligence in Health                                   Synthetic data for obesity level prediction




            Table 1. Summary of literature on obesity risk prediction using machine learning (ML) algorithms
            Study              Dataset           ML algorithm                      Results
            Helforoush   UCI Obesity dataset   ANN + PSO hybrid;   The ANN-PSO model achieved an accuracy of 92%, outperforming
            and Sayyad 15   (2,111 samples; 17 features)  compared with baseline   standard regression methods. SHAP analysis identified weight and
                                              regression       height as the most influential features
            Ayub et al. 16  UCI Obesity dataset  Attention Bi-LSTM deep  The proposed model achieved 96.5% accuracy in obesity classification,
                                              network          surpassing previous approaches. The integration of an attention
                                                               mechanism enhanced the model’s ability to capture feature influence
            Shakti et al. 17  UCI Obesity dataset  Multiple comparisons:   The MLP achieved the highest accuracy of 97.2%, followed by GB
                                              k-NN, SVM, RF, GBM,   with~96.2%. These results highlight the advantage of incorporating
                                              MLP              diverse features to improve classification performance
            Yağmur 18  UCI Obesity dataset    DT + POA (hybrid   The hybrid DT–POA model with fuzzy tuning outperformed the
                                              model)           baseline DT, demonstrating improved classification performance for
                                                               obesity levels
            Özkurt 19  UCI Obesity dataset    XGBoost, RF, NB, k-NN,  XGBoost achieved the highest accuracy of 92%. SHAP analysis
                                              DT (+ SHAP XAI)  identified key predictors, including family history of obesity and
                                                               vegetable intake
            Wang 20   UCI Obesity dataset (height/weight  Ordinal versus   The LogitBoost model achieved the highest performance with~70%
                      excluded)               multinomial Logit;   accuracy (Kappa=0.65). Other ML models yielded accuracies ranging
                                              LogitBoost; SVM, NB,   from 75% to 79%. The overall lower accuracy was attributed to the
                                              RF, k-NN         exclusion of BMI-related features. Nonetheless, active transportation
                                                               (e.g., biking), and family history were identified as key predictors
            Okpe et al. 21  UCI Obesity dataset  Multilayer perceptron   A tuned ANN achieved 97% accuracy in multi-class obesity prediction,
                                              ANN              demonstrating that high accuracy can be attained with a relatively
                                                               simple NN architecture
            Azad et al. 22  UCI Obesity dataset  Stacked ensemble (GBM,  The stacking ensemble model achieved~98% accuracy, outperforming
                                              XGB, etc.) + LIME   previous models (~97.8%). Model explainability was enhanced through
                                              explanations     the integration of LIME
            Solomon    UCI Obesity dataset    Hybrid voting ensemble   The ensemble model achieved an accuracy of 97.16%, surpassing the
            et al. 23                         (XGBoost + GBM + MLP) single XGBoost model (~96.4%). These results set a high benchmark
                                                               for future studies in obesity prediction
            Kaur et al. 24  UCI Obesity dataset  GB, BME, XGBoost, RF,   XGBoost achieved 97.79% accuracy with a 70 – 30 train-test split,
                                              SVM, k-NN        followed by GBM with~97.16%. The results demonstrated the
                                                               superiority of ensemble methods. In addition, the model provided
                                                               personalized diet recommendations based on predictive outcomes
            Muliawan    Kaggle Obesity dataset    RF           An accuracy of 81.76% was achieved using only eating habit
            et al. 25  (2,111 samples; 17 features)            parameters, validating the effectiveness of RF as a screening tool for
                                                               obesity risk based solely on dietary data
            Choudhuri    UCI Obesity dataset  Hybrid ML model   A hybrid approach was proposed for estimating obesity levels,
            et al. 26                         (combining algorithms)  combining multiple ML techniques. This method improved accuracy
                                                               compared to individual models and has been cited in subsequent
                                                               studies for its pioneering contribution
            Cervantes and  UCI Obesity dataset    Computational   An early study achieved viable obesity level prediction, laying the
            Palacio 27  (original introduction)  intelligence methods    groundwork for the application of ML on this dataset and serving as a
                                              (e.g., ANN, fuzzy)  baseline in later research
            Ganie et al. 28  Kaggle Obesity dataset    Bagged DT, RF, extra tree,  The proposed model achieved 98.10% accuracy in obesity classification,
                      (2,111 samples; 17 features)  XGBoost, GB, CatBoost,  outperforming previous approaches. The ensemble of boosting
                                              voting classifier  algorithms effectively captured complex patterns in lifestyle data
            Nagarajan    UCI Obesity dataset  TabNet, XGBoost, RF,   The proposed model achieved 99.3% accuracy in obesity classification,
            et al. 29                         MLP, bagging, DT, SVM,  outperforming previous approaches. The use of SMOTE and deep
                                              k-NN, SGD, AdaBoost,   learning techniques enhanced learning from imbalanced classes
                                              stacking, GB
            Umoh et al. 30  UCI Obesity dataset  KNN, SVM, bagging,   The proposed model achieved 93.97% accuracy in obesity classification.
                                              stacking, voting, LR, DT,  Optimization through feature selection techniques improved the
                                              AdaBoost         model’s understanding of dietary and physical habits
                                                                                                       (Contd...)

            Volume 2 Issue 4 (2025)                         49                          doi: 10.36922/AIH025140027
   50   51   52   53   54   55   56   57   58   59   60