Page 58 - AIH-2-4
P. 58
Artificial Intelligence in Health Synthetic data for obesity level prediction
In a seminal study, Kaur et al. investigated the notable as one of the earliest applications of ML algorithms
24
application of ML algorithms for obesity risk prediction to the “Obesity Levels” dataset. The researchers employed
and meal planning. Using the UCI obesity dataset, the computational intelligence techniques – potentially
researchers applied six ML algorithms – GB, Bagging meta- including neural networks or fuzzy systems – to estimate
estimator, XGBoost, RF, SVM, and k-NN – to predict adult obesity levels. This pioneering work catalyzed broader
obesity risk. The models were evaluated under various train- interest in the dataset, contributing to the establishment
test split ratios (90/10, 80/20, 70/30, etc.), with ensemble of baseline results and illustrating the feasibility of obesity
methods consistently demonstrating superior performance. classification through ML methods.
Notably, XGBoost achieved an accuracy of up to 97.79% Ganie et al. explored the efficacy of ensemble learning
28
at the 70:30 split, followed closely by GB at ~97.16%. In techniques for predicting obesity risk using a publicly
contrast, simpler models such as k-NN and SVM showed available Kaggle dataset focused on lifestyle behaviors.
lower accuracy, ranging from 82% to 87%. The study also The study applied various ensemble learning methods,
featured a diet recommendation component generated including RF, extra trees, XGBoost, and CatBoost, using
based on the model’s predictions, demonstrating a practical both bagging and boosting strategies. Among these,
integration of ML with personalized dietary guidance. XGBoost delivered the highest performance, achieving
This early work established the reliability of ML models an accuracy of 98.1% and an F1-score of 96.5%. The
– particularly boosting ensembles – in predicting obesity- findings demonstrate the robustness of ensemble models,
related outcomes, with the reported accuracy of XGBoost particularly boosting techniques, in deriving predictive
(~97.8%) serving as a benchmark in subsequent literature. insights from multi-dimensional lifestyle datasets.
Muliawan et al. focused on leveraging only eating Nagarajan et al. performed a comparative analysis of
25
29
habit features for obesity risk prediction, employing an RF several ML and deep learning models for predicting obesity
classifier. The study utilized an open-access version of the levels using a real-world dataset with 17 features, including
17-feature obesity dataset obtained from Kaggle, placing demographic and health-related variables. To improve
emphasis on dietary variables (e.g., frequency of high-calorie model performance on imbalanced classes, the authors
food consumption and meal frequency) while deliberately implemented SMOTE. The algorithms tested included
minimizing reliance on physical measurements. The RF TabNet, XGBoost, GB, MLP, and RF. The GB algorithm
model achieved an accuracy of 81.76% in distinguishing achieved the highest accuracy of 99.3%, with XGBoost and
between high-risk and low-risk individuals. Although this TabNet following closely at 99% and 98.4%, respectively,
performance is lower than that of models incorporating validating the effectiveness of ensemble and deep learning
both dietary and physical attributes, it underscores the models in healthcare data analysis.
critical role of physical features in achieving optimal 30
predictive accuracy. Nonetheless, the findings demonstrate Umoh et al. focused on optimizing various ML
that food intake patterns alone can yield approximately classifiers to estimate obesity levels from physical activity
82% accuracy, emphasizing the potential of ML algorithms and dietary data obtained through structured surveys.
in healthcare-related applications. The authors conclude The dataset underwent thorough preprocessing, including
normalization and feature selection. The study evaluated
that RF can serve as an effective screening tool in scenarios a range of classifiers, including SVM, GB, DT, and others.
where detailed anthropometric data are unavailable.
Among them, GB emerged as the top-performing model,
Choudhuri et al. proposed a hybrid ML model for achieving an accuracy of 97.23%. This research highlighted
26
obesity level estimation, utilizing the UCI obesity dataset. the significance of integrating robust feature selection with
While the paper does not report specific performance classifier tuning for effective obesity level prediction.
metrics, the term “hybrid” suggests a combination of Vairachilai et al. applied the protein intake prediction
31
classification and optimization techniques. Subsequent and response (PIPR) ML model to analyze the impact
studies have cited this work as an early example of of dietary behavior on obesity during the COVID-19
integrating multiple classifiers to enhance prediction pandemic. The dataset included comprehensive lifestyle
accuracy. This study is considered foundational in the and nutritional behavior indicators. Multiple ensemble
adoption of ensemble and hybrid approaches within learning algorithms, such as RF and extra trees, were
obesity prediction research. It paved the way for later evaluated in the study. The PIPR model stood out with an
works – such as that of Helforoush and Sayyad – which accuracy of 96.7%, demonstrating its capability to capture
15
further developed and refined these strategies. nuanced relationships between protein intake and obesity
In a related vein, the study by Cervantes and Palacio, risk and confirming the value of ensemble strategies in
27
published in Informatics in Medicine Unlocked in 2020, is obesity prediction tasks.
Volume 2 Issue 4 (2025) 52 doi: 10.36922/AIH025140027

