Page 55 - AIH-2-4
P. 55
Artificial Intelligence in Health Synthetic data for obesity level prediction
Table 1. Summary of literature on obesity risk prediction using machine learning (ML) algorithms
Study Dataset ML algorithm Results
Helforoush UCI Obesity dataset ANN + PSO hybrid; The ANN-PSO model achieved an accuracy of 92%, outperforming
and Sayyad 15 (2,111 samples; 17 features) compared with baseline standard regression methods. SHAP analysis identified weight and
regression height as the most influential features
Ayub et al. 16 UCI Obesity dataset Attention Bi-LSTM deep The proposed model achieved 96.5% accuracy in obesity classification,
network surpassing previous approaches. The integration of an attention
mechanism enhanced the model’s ability to capture feature influence
Shakti et al. 17 UCI Obesity dataset Multiple comparisons: The MLP achieved the highest accuracy of 97.2%, followed by GB
k-NN, SVM, RF, GBM, with~96.2%. These results highlight the advantage of incorporating
MLP diverse features to improve classification performance
Yağmur 18 UCI Obesity dataset DT + POA (hybrid The hybrid DT–POA model with fuzzy tuning outperformed the
model) baseline DT, demonstrating improved classification performance for
obesity levels
Özkurt 19 UCI Obesity dataset XGBoost, RF, NB, k-NN, XGBoost achieved the highest accuracy of 92%. SHAP analysis
DT (+ SHAP XAI) identified key predictors, including family history of obesity and
vegetable intake
Wang 20 UCI Obesity dataset (height/weight Ordinal versus The LogitBoost model achieved the highest performance with~70%
excluded) multinomial Logit; accuracy (Kappa=0.65). Other ML models yielded accuracies ranging
LogitBoost; SVM, NB, from 75% to 79%. The overall lower accuracy was attributed to the
RF, k-NN exclusion of BMI-related features. Nonetheless, active transportation
(e.g., biking), and family history were identified as key predictors
Okpe et al. 21 UCI Obesity dataset Multilayer perceptron A tuned ANN achieved 97% accuracy in multi-class obesity prediction,
ANN demonstrating that high accuracy can be attained with a relatively
simple NN architecture
Azad et al. 22 UCI Obesity dataset Stacked ensemble (GBM, The stacking ensemble model achieved~98% accuracy, outperforming
XGB, etc.) + LIME previous models (~97.8%). Model explainability was enhanced through
explanations the integration of LIME
Solomon UCI Obesity dataset Hybrid voting ensemble The ensemble model achieved an accuracy of 97.16%, surpassing the
et al. 23 (XGBoost + GBM + MLP) single XGBoost model (~96.4%). These results set a high benchmark
for future studies in obesity prediction
Kaur et al. 24 UCI Obesity dataset GB, BME, XGBoost, RF, XGBoost achieved 97.79% accuracy with a 70 – 30 train-test split,
SVM, k-NN followed by GBM with~97.16%. The results demonstrated the
superiority of ensemble methods. In addition, the model provided
personalized diet recommendations based on predictive outcomes
Muliawan Kaggle Obesity dataset RF An accuracy of 81.76% was achieved using only eating habit
et al. 25 (2,111 samples; 17 features) parameters, validating the effectiveness of RF as a screening tool for
obesity risk based solely on dietary data
Choudhuri UCI Obesity dataset Hybrid ML model A hybrid approach was proposed for estimating obesity levels,
et al. 26 (combining algorithms) combining multiple ML techniques. This method improved accuracy
compared to individual models and has been cited in subsequent
studies for its pioneering contribution
Cervantes and UCI Obesity dataset Computational An early study achieved viable obesity level prediction, laying the
Palacio 27 (original introduction) intelligence methods groundwork for the application of ML on this dataset and serving as a
(e.g., ANN, fuzzy) baseline in later research
Ganie et al. 28 Kaggle Obesity dataset Bagged DT, RF, extra tree, The proposed model achieved 98.10% accuracy in obesity classification,
(2,111 samples; 17 features) XGBoost, GB, CatBoost, outperforming previous approaches. The ensemble of boosting
voting classifier algorithms effectively captured complex patterns in lifestyle data
Nagarajan UCI Obesity dataset TabNet, XGBoost, RF, The proposed model achieved 99.3% accuracy in obesity classification,
et al. 29 MLP, bagging, DT, SVM, outperforming previous approaches. The use of SMOTE and deep
k-NN, SGD, AdaBoost, learning techniques enhanced learning from imbalanced classes
stacking, GB
Umoh et al. 30 UCI Obesity dataset KNN, SVM, bagging, The proposed model achieved 93.97% accuracy in obesity classification.
stacking, voting, LR, DT, Optimization through feature selection techniques improved the
AdaBoost model’s understanding of dietary and physical habits
(Contd...)
Volume 2 Issue 4 (2025) 49 doi: 10.36922/AIH025140027

