Page 142 - AIH-1-2
P. 142
Artificial Intelligence in Health Movement detection with sensors and AI
Numeric features represent columns with numerical or oversampling is performed would yield consistent
values, encompassing both continuous and discrete data. results each time the code is run with the same session
In this context, the dataset comprises 202 numeric features, ID.
with the “Predict” feature serving as a categorical target • Data format and preparation: The input data are
variable (Table 1). The value “True” for preprocessing provided as a CSV file, which is a standard, easy-to-
indicates that preprocessing steps are applied to the work-with data format. The data include both the
data during the setup process. For the current study, features (e.g., sensor readings) and the target. The
preprocessing steps known as “LabelEncoder” and features are the inputs that the model will learn from,
“SimpleImputer” were applied. The label encoder is a while the target is the output category that the model
preprocessing step applied to convert categorical target is trained to predict.
variables (if any) into numerical format. It transforms • Target variable: The target variable is categorical,
categorical labels into integer values, making them suitable meaning that it does not have a natural order or
for training the machine learning model. The simple numerical value; the assigned numbers are just labels
imputer is also a preprocessing step applied to handle for the classes. As the target is represented numerically,
missing values in the dataset. It fills in missing values using each number corresponds to a discrete category of
simple strategies such as the feature’s mean, median, or patient movement, and the model learns to predict
most frequent value. In cases where numeric features have these categories.
missing values, the “mean” imputation method is used, • Workflow steps: The typical workflow in PyCaret for
replacing the missing numeric values with the mean of the classification consists of five steps: setup, compare
corresponding feature. Conversely, for categorical features models, analyze model, save model, and prediction.
with missing values, the “mode” imputation method is (i) Setup: This crucial first step initializes the analysis
used, replacing the missing categorical values with the environment by setting up the data and defining
mode (most frequent category) of the corresponding the target. It also performs basic processing like
feature. handling missing values, encoding categorical
variables, normalizing the data, and potentially
The “Compare Models” function trains and evaluates feature engineering.
the performance of all available estimators using cross- (ii) Compare models: This step systematically trains
validation, providing a scoring grid with average cross- and evaluates different machine learning models
validated scores. To analyze the performance of a trained using the preprocessed data, subsequently
model on the test set, the “plot_model” function can be ranking them according to a chosen evaluation
used. It offers different plot types, such as confusion matrix metric, usually accuracy for classification tasks.
and AUC, for assessing model performance. In certain (iii) Analyze model: For the chosen model, its
cases, re-training the model may be required for plotting performance metrics, decision boundary, feature
specific visualizations. Finally, the model with the entire importance, confusion matrix, and other insights
pipeline is saved on disk for future use, especially for are analyzed to understand how well the model
prediction of unseen data. works. This step provides information about the
Hence, the typical workflow in PyCaret for a classifier’s behavior under various conditions
classification task involves several steps, beginning with through ROC curves, precision-recall curves, and
the “Setup.” During “Setup,” the user initiates the training classification errors, allowing the user to deeply
environment by defining the dataset (data) and the variable interrogate specific models and understand areas
to be predicted (target). In this case, the target refers to for improvement.
various movements like “Roll right,” “Roll left,” “Drop (iv) Save and predict model: With the model saved,
right,” “Drop left,” “Breathing,” and “Seizure,” encoded predictions can be made on new data that the
numerically from 0 to 5, respectively. The following is how model has not seen before. This is the ultimate
PyCaret handles the classification workflow: goal of the machine learning workflow—
• Session ID: In the setup stage, specifying a session applying the constructed model to make accurate
ID as a pseudorandom number (e.g., 123) serves classifications on real-world data.
as a seed for all randomness within the pipeline, The training and test datasets are created during the
ensuring that the experiment is reproducible. This setup, with PyCaret automatically splitting the input data
setup process implies that the random division of into these subsets. The typical default splits allocate 70%
data into folds when applying cross-validation or the of the data for training and 30% for testing. The session
random selection of data points if any undersampling ID ensures consistency in any randomization during this
Volume 1 Issue 2 (2024) 136 doi: 10.36922/aih.2790

