Page 143 - AIH-1-2
P. 143
Artificial Intelligence in Health Movement detection with sensors and AI
process across sessions. Therefore, the transformed train classification capacity is equivalent to random guessing,
set shape and test set shape reported after setup refers to devoid of predictive value. An AUC lower than 0.5 suggests
the shapes of these datasets post- and pre-processing and a classification capacity worse than random guessing;
splitting. The workflow encapsulated by the PyCaret setup however, if a reverse prediction is conducted, it is superior
supports the end-to-end process of building and deploying to random guessing. The collection of all sample points
classification models. In the study context, the models forming a line constitutes an ROC curve. 15
are tasked with classifying patient movements based on Cohen’s Kappa, often known as “Kappa,” is a statistical
sensor data. The workflow enables the use of sophisticated measure used to assess the agreement between predicted
machine learning algorithms without requiring the user to and actual classes while also accounting for the level of
dive deep into the algorithmic complexities associated with agreement beyond what would occur by chance. This metric
each model. Consequently, researchers and practitioners holds particular significance when dealing with imbalanced
can focus more on interpreting the results and less on datasets, as it considers chance-based agreement. Kappa
managing the workflow mechanics.
values range from –1 to 1, where 1 signifies perfect agreement,
2.4. Evaluation metrics 0 denotes chance-based agreement, and values below 0
indicate predictions worse than random. Meanwhile, MCC
Pycaret provides a range of metrics, including precision, serves as another metric for assessing the quality of binary
recall, F1 score, accuracy, AUC, Cohen’s Kappa, MCC, and multiclass classifications; it takes into account true
and TT. Accuracy, as defined in Equation I, represents the positives, true negatives, false positives, and false negatives,
proportion of correctly classified values out of the total making it useful in scenarios involving imbalanced datasets.
number. Precision, as per Equation II, is computed as the Similar to Cohen’s Kappa, the MCC also ranges from –1 to
ratio of true positive instances to all predicted positive 1: a value of 1 indicates flawless prediction capability, while
instances, where a higher precision score indicates fewer a value of zero represents predictions at random; anything
false positive predictions. Recall, defined in Equation III, below zero suggests predictive performance worse than
assesses the ability to identify actual positives and is also random guessing. Finally, TT refers to the duration taken
known as sensitivity. The F1 score, as per Equation IV, by a specific machine learning model to train on the dataset,
synthesizes precision and recall into a single value between typically measured in seconds. This metric offers valuable
0 and 1; higher scores indicate better performance in both insight into the time required to train a particular model.
areas, while lower scores suggest poor precision or recall.
3. Results
True Positive+True Negative
Accuracy= (I)
True Positive+False Positive+ A total of 15 machine-learning classification models were
True Negative+False Negative tested using Pycaret (Table 2). These models included
Light Gradient Boosting Machine (LIGHTLGBM), Extra
Tree Classifier (ET), Extreme Gradient Boosting, Random
True Positive
Precision= (II) Forest Classifier, Gradient Boosting Classifier, Decision
True Positive+False Positive Tree Classifier, K Neighbors Classifier, Naive Bayes, Linear
Discriminant Analysis, Logistic Regression, Support
True Positive
Recall= (III) Vector Machine - Linear Kernel, Ridge Classifier, AdaBoost
True Positive+False Negative Classifier, Quadratic Discriminant Analysis, and Dummy
Classifier (DUMMY). DUMMY makes predictions that
2 2×Precision×Recall ignore the input features, serving as a simple baseline for
F1 score= = (IV)
1 + 1 Precision+Recall comparison against more complex classifiers.
Precision Recall
The LIGHTLGBM model exhibited the highest
The ROC evaluates the difference in the rates of true accuracy, recall, precision, F1 score, Kappa, and MCC.
positive rate and false positive rate results across different Specifically, the accuracy, recall, precision, F1 score, Kappa,
decision thresholds. The AUC serves as an indicator and MCC of the LIGHTLGBM model were 0.89, 0.89, 0.90,
16
of the model’s effectiveness, allowing for comparison 0.89, 0.87, and 0.87, respectively, with an AUC of 0.98. The
of performance across various models. 17-19 An AUC performance metrics of the LIGHTLGBM model closely
equal to one indicates a perfect model, while an AUC resembled those of the ET model, with the ET model
exceeding 0.5 indicates that the model’s classification displaying slightly lower accuracy, recall, precision, F1,
capability outperforms random guessing and possesses Kappa, and MCC, but marginally higher AUC on the
predictive value. An AUC of 0.5 signifies that the model’s ROC curve (Figure 2). In addition, the confusion matrix
Volume 1 Issue 2 (2024) 137 doi: 10.36922/aih.2790

