Page 68 - AIH-1-1

P. 68

Artificial Intelligence in Health Advancing fetal health classification

performance, outperforming the other models. Therefore, 5. Results
to train and evaluate the fetal health classification model,
the LightGBM classifier was employed. The LGBMClassifier Figure 2 depicts the numerical results of LightGBM model
implementation available on the scikit-learn library was in fetal health classification. The performance of the model
utilized. The model was trained using a 20-fold cross- was evaluated using various metrics, including accuracy,
validation procedure, which involved dividing the dataset area under the curve (AUC), recall, precision, F1 score,
into 20 subsets. We trained the model on 19 subsets and kappa, and Matthews correlation coefficient (MCC).
evaluated its performance on the remaining subset. This The highlights of the results presented in Figure 2 are
process was repeated 20 times, with each subset serving as as follows:
the evaluation set once. Through aggregation of the results, • The LightGBM model showcased remarkable
a comprehensive evaluation of the model’s performance performance in the classification of fetal health
was obtained.
conditions, achieving an outstanding accuracy of
SMOTE was applied to address any class imbalance 98.32%. This elevated accuracy underscores the
issues present in the dataset by balancing the distributions model’s proficiency in correctly categorizing the
of the different classes to ensure that the model learned majority of instances within the dataset, attesting to
from a more representative dataset. This preprocessing its robust learning capabilities.
step enhanced the model’s ability to handle imbalanced • Further enhancing its evaluative prowess, the model
class distributions and improved its overall performance. yielded an impressive AUC score of 0.9985. This
The LightGBM classifier was configured with default exceptional AUC score signifies the model’s excellent
hyperparameters, including a learning rate of 0.1, an unlimited discrimination ability, effectively distinguishing
maximum tree depth, and a minimum of 20 samples required between diverse fetal health classes. The high AUC value
in each leaf. A hundred boosting iterations were utilized, indicates that the model excels in accurately ranking
and the number of leaves in each tree was set to 31. The instances according to their predicted probabilities,
model was trained using all available parallel threads (−1) to adding a layer of confidence to its predictive capabilities.
leverage efficient computational resources. • In terms of recall, the LightGBM model achieved a
remarkable score of 0.9937. This noteworthy metric
To assess the model’s generalization capability, the underscores the model’s adeptness in correctly
dataset was split into a training set comprising 80% of the identifying instances belonging to the positive class,
data and a test set containing the remaining 20%. The test whether indicative of a healthy or abnormal fetal health
set was not used during the model training process and condition. The high recall value attests to the model’s
served as an independent dataset for evaluating the model’s sensitivity, which is particularly crucial in capturing
performance on unseen data. instances with positive labels and minimizing false

4.5. Performance metrics negatives.
• Precision, a measure of the model’s accuracy in
To evaluate the performance of the fetal health classification classifying instances as positive, demonstrated a
model, several standard metrics, including accuracy, commendable score of 0.9790. This result underscores
precision, recall, and F1 score, were used. These metrics the model’s precision in correctly identifying instances
provided insights into the model’s ability to correctly
classify fetal health conditions. The confusion matrix
was also analyzed to understand the distribution of true
positives, true negatives, false positives, and false negatives.
The evaluation metrics allowed us to assess the strengths
and limitations of the model in fetal health classification.
This experimental setup allowed for the development
of a reliable and accurate ML model for fetal health
classification. The dataset selection, data preprocessing,
feature selection, model training, and evaluation process
were carefully designed to ensure the validity and rigor
of these experiments. The following sections present
experimental results and a discussion of the finding,
offering insights into the performance and implications of
the proposed model for fetal health assessment. Figure 3. Receiver operating characteristic curve.

Volume 1 Issue 1 (2024) 62 https://doi.org/10.36922/aih.2121

63 64 65 66 67 68 69 70 71 72 73