Page 56 - AIH-2-4
P. 56

Artificial Intelligence in Health                                   Synthetic data for obesity level prediction




            Table 1. (Continued)
            Study              Dataset           ML algorithm                      Results
            Vairachilai    Kaggle COVID-19 Healthy Diet   Protein Food Item   The proposed model achieved high predictive accuracy, with MAPE of
            et al. 31  dataset                Prediction Regression   29% for meat and milk and 31% for oil crops and vegetable products.
                                              model            The integration of protein-rich food variables allowed refined modeling
                                                               of feature influence in obesity prediction
            Forte et al. 32  FITescola  project dataset  CNN   The proposed model achieved 75% accuracy in obesity classification.
                            ®
                                                               The inclusion of physical fitness variables improved feature
                                                               interpretability and overall model performance
            Yağın et al. 33  Physical Activity and Eating Habits  Trained NN with   The proposed model achieved 93.06% accuracy in obesity classification,
                      dataset from İnönü University;   Bayesian optimization  outperforming prior methods. The integration of Bayesian
                      includes alcohol use, device use,        optimization enhanced the model’s ability to select critical features
                      and meal frequency
            Gözükara Bağ  Web-based public dataset on   LR, RF, XGBoost with   The proposed model achieved 99.33% accuracy using logistic
            et al. 34  physical activity and nutrition   Bayesian optimization  regression, with improved classification accuracy after feature selection.
                      (gender, BMI, diet, etc.)                The inclusion of nutritional and activity data further strengthened the
                                                               model’s predictive capacity
            Abbreviations: ANN: Artificial neural network; BME: Bagging meta-estimator; Bi-LSTM: Bidirectional long short-term memory; BMI: Body mass
            index; CNN: Convolutional neural network; COVID-19: Coronavirus disease 2019; DT: Decision tree; GB: Gradient boosting; GBM: Gradient boosting
            machine; k-NN: k-nearest neighbors; LIME: Local interpretable model-agnostic explanations; LogitBoost: Logistic regression boosting; LR: Logistic
            regression; MAPE: Mean absolute percentage error; MLP: Multi-layer perceptron; NB: Naïve Bayes; NN: Neural network; POA: Pelican optimization
            algorithm; PSO: Particle swarm optimization; RF: Random Forest; SGD: Stochastic Gradient Descent; SHAP: Shapley additive explanations;
            SVM: Support vector machine; UCI: University of California, Irvine; XAI: Explainable artificial intelligence; XGBoost: Extreme gradient boosting.


            obesity type I, 35–39.9 as obesity type II, and 40 or higher   highlighted this result as a paradigm shift, demonstrating
            as obesity type III.                               the effectiveness of attention-based deep sequential models
              In the study by Helforoush and Sayyad , titled Hybrid   in enabling accurate obesity risk prediction.
                                              15
            Metaheuristic ANN-PSO, various ML models were applied   Shakti  et al.  evaluated multiple ML frameworks on
                                                                           17
            for obesity risk prediction. The authors proposed a   the  UCI  obesity  dataset,  which  contains  2,111  instances
            hybrid artificial neural network optimized using particle   with 17 attributes related to eating habits and lifestyle
            swarm  optimization  (ANN-PSO).  When  evaluated  on   factors. The models tested included k-nearest neighbors
            the University of California, Irvine (UCI) obesity dataset   (k-NN), support vector machine (SVM), random forest
            – which contains 2,111 records and 17 features related to   (RF), gradient boosting (GB), and a multilayer perceptron
            dietary habits and physical conditions – the ANN-PSO   (MLP) neural network. Among these, the MLP classifier
            model achieved an accuracy of ~92%, outperforming   achieved the highest accuracy at 97.2%, followed closely by
            traditional regression models. To enhance interpretability,   GB at ~96.2%. These findings highlight that incorporating
            the study employed Shapley additive explanation analysis,   diverse features – such as dietary habits and physical
            which revealed that weight and height were among the   activity – alongside robust learning algorithms like neural
            most influential features in predicting obesity levels.   networks (NNs) can yield high classification performance.
            These findings highlight the potential of metaheuristic   The  study emphasizes  that such levels of  accuracy  are
            optimization methods to  improve  the  performance  of   essential for enabling targeted interventions for individuals
            neural networks in personalized obesity risk profiling.  at risk of obesity.
              Ayub  et al.  developed an attention-enhanced      Yağmur  proposed a hybrid model that combines a
                          16
                                                                        18
            bidirectional long short-term memory (ABi-LSTM) model   decision tree (DT) classifier with the pelican optimization
            to classify individuals into obesity categories using the same   algorithm (POA), a metaheuristic optimization technique,
            dataset. Their deep learning architecture incorporated an   to enhance obesity level classification. Utilizing the
            attention mechanism to emphasize key features – such as   2,111-instance dataset, the model applied fuzzy parameter
            height, weight, and activity level – allowing the model to   tuning via POA to optimize the tree’s decision thresholds
            capture complex patterns within the data. The proposed   for multiclass categorization. The hybrid DT-POA
            ABi-LSTM achieved a multiclass classification accuracy of   approach reportedly outperforms the standard DT model
            96.5%, representing a substantial improvement in precision,   in predicting obesity levels. Although the precise accuracy
            recall, and F1-score over existing approaches. The authors   value is not explicitly stated, the author highlights the


            Volume 2 Issue 4 (2025)                         50                          doi: 10.36922/AIH025140027
   51   52   53   54   55   56   57   58   59   60   61