Page 143 - AIH-1-2
P. 143

Artificial Intelligence in Health                                    Movement detection with sensors and AI



            process across sessions. Therefore, the transformed train   classification capacity is equivalent to random guessing,
            set shape and test set shape reported after setup refers to   devoid of predictive value. An AUC lower than 0.5 suggests
            the shapes of these datasets post- and pre-processing and   a classification capacity worse than random guessing;
            splitting. The workflow encapsulated by the PyCaret setup   however, if a reverse prediction is conducted, it is superior
            supports the end-to-end process of building and deploying   to random guessing. The collection of all sample points
            classification models. In the study context, the models   forming a line constitutes an ROC curve. 15
            are tasked with classifying patient movements based on   Cohen’s Kappa, often known as “Kappa,” is a statistical
            sensor data. The workflow enables the use of sophisticated   measure used to assess the agreement between predicted
            machine learning algorithms without requiring the user to   and actual classes while also accounting for the level of
            dive deep into the algorithmic complexities associated with   agreement beyond what would occur by chance. This metric
            each  model.  Consequently,  researchers  and  practitioners   holds particular significance when dealing with imbalanced
            can focus more on interpreting the results and less on   datasets, as it considers chance-based agreement. Kappa
            managing the workflow mechanics.
                                                               values range from –1 to 1, where 1 signifies perfect agreement,
            2.4. Evaluation metrics                            0 denotes chance-based agreement, and values below 0
                                                               indicate predictions worse than random. Meanwhile, MCC
            Pycaret provides a range of metrics, including precision,   serves as another metric for assessing the quality of binary
            recall, F1 score, accuracy, AUC, Cohen’s Kappa, MCC,   and  multiclass  classifications;  it  takes  into  account  true
            and TT. Accuracy, as defined in Equation I, represents the   positives, true negatives, false positives, and false negatives,
            proportion of correctly classified values out of the total   making it useful in scenarios involving imbalanced datasets.
            number. Precision, as per Equation II, is computed as the   Similar to Cohen’s Kappa, the MCC also ranges from –1 to
            ratio of true positive instances to all predicted positive   1: a value of 1 indicates flawless prediction capability, while
            instances, where a higher precision score indicates fewer   a value of zero represents predictions at random; anything
            false positive predictions. Recall, defined in Equation III,   below zero  suggests predictive performance  worse than
            assesses the ability to identify actual positives and is also   random guessing. Finally, TT refers to the duration taken
            known as sensitivity. The F1 score, as per Equation IV,   by a specific machine learning model to train on the dataset,
            synthesizes precision and recall into a single value between   typically measured in seconds. This metric offers valuable
            0 and 1; higher scores indicate better performance in both   insight into the time required to train a particular model.
            areas, while lower scores suggest poor precision or recall.
                                                               3. Results
                      True Positive+True Negative
            Accuracy=                                   (I)
                     True Positive+False Positive+             A total of 15 machine-learning classification models were
                     True Negative+False Negative              tested  using  Pycaret  (Table  2).  These  models  included
                                                               Light Gradient Boosting Machine (LIGHTLGBM), Extra
                                                               Tree Classifier (ET), Extreme Gradient Boosting, Random
                           True Positive
            Precision=                                 (II)    Forest Classifier, Gradient Boosting Classifier, Decision
                     True Positive+False Positive              Tree Classifier, K Neighbors Classifier, Naive Bayes, Linear
                                                               Discriminant Analysis, Logistic Regression, Support
                         True Positive
            Recall=                                    (III)   Vector Machine - Linear Kernel, Ridge Classifier, AdaBoost
                   True Positive+False Negative                Classifier, Quadratic Discriminant Analysis, and Dummy
                                                               Classifier (DUMMY). DUMMY makes predictions that
                            2          2×Precision×Recall      ignore the input features, serving as a simple baseline for
            F1 score=                =                 (IV)
                     1       + 1        Precision+Recall       comparison against more complex classifiers.
                      Precision  Recall
                                                                 The LIGHTLGBM model exhibited the highest
              The ROC evaluates the difference in the rates of true   accuracy, recall, precision, F1 score, Kappa, and MCC.
            positive rate and false positive rate results across different   Specifically, the accuracy, recall, precision, F1 score, Kappa,
            decision thresholds.  The AUC serves as an indicator   and MCC of the LIGHTLGBM model were 0.89, 0.89, 0.90,
                            16
            of the model’s effectiveness, allowing for comparison   0.89, 0.87, and 0.87, respectively, with an AUC of 0.98. The
            of performance across various models. 17-19  An AUC   performance metrics of the LIGHTLGBM model closely
            equal to one indicates a perfect model, while an AUC   resembled those of the ET model, with the ET model
            exceeding 0.5 indicates that the model’s classification   displaying slightly lower accuracy, recall, precision, F1,
            capability outperforms random guessing and possesses   Kappa, and MCC, but marginally higher AUC on the
            predictive value. An AUC of 0.5 signifies that the model’s   ROC curve (Figure 2). In addition, the confusion matrix


            Volume 1 Issue 2 (2024)                        137                               doi: 10.36922/aih.2790
   138   139   140   141   142   143   144   145   146   147   148