Page 103 - AIH-2-1
P. 103
Artificial Intelligence in Health EBNA1 inhibitors against EBV in NPC
wheren represents the number of predictions; y i 1.000, an F1 score of 0.977, and an accuracy score of 0.976.
ˆ
represents the observed values; and y represents the Meanwhile, the CSE-J48-IBK-BF model demonstrated a
i
predicted values. precision score of 0.952, a recall score of 0.952, an F1 score
RAE serves as a measure to assess the performance of a of 0.952, and an accuracy score of 0.953. We visualized the
predictive model and is represented as a ratio. Lower RAE performance of these models using a confusion matrix
scores indicate a more effective mode. The equation for (Figure 2).
44
calculating RAE is as follows. We evaluated our models using an external test set
comprising eight compounds (Table 2). The CFS-LR-BF
n
∑ y − ˆ y and CFS-LR-GS QSAR classification models demonstrated
RAE = i1= i i (VIII)
∑ n i1 y − y precision scores of 1.000 and recall scores of 0.571. Both
i
=
models achieved F1 scores of 0.727 and accuracy scores of
where 0.667. The CFS-NB-BF and CFS-NB-GS models exhibited
precision scores of 1.000 and recall scores of 0.429. Both
1 n models achieved F1 scores of 0.600 and accuracy scores of
y y (IX)
n i1 i 0.556. Finally, the CSE-J48-LR-BF and CSE-J48-IBK-BF
models demonstrated precision scores of 1.000, with
where n represents the number of observations; y i recall scores of 0.429. Both models achieved F1 scores of
represents the observed value; and y represents the 0.600 and accuracy scores of 0.556. We also presented the
average of observed values. results of the test set evaluation using a series of confusion
These four assessment regression metrics offer a matrices (Figure 3). These visual representations show the
thorough perspective on the performance of regression models’ performance in classifying active and inactive
QSAR models. compounds.
2.5. Model deployment 3.2. Regression QSAR models
After constructing the QSAR models, we validated all For regression-based models, we obtained the following
our models using the external test set. The chosen model results. For the training set of CSE-LRE-BF-SMO and CSE-
was then deployed on the enamine advanced compound LRE-GS-SMO, both models achieved R scores of 0.992.
library. The enamine advanced compound library was Both models had MAE values of 0.029 and RMSE values
similarly featurized with chemical fingerprints using the of 0.037. The RAE values for both models were 0.118. For
PaDEL-Descriptor package. the training set of the CSE-SMO-BF-LRE and CSE-SMO-
GS-LRE QSAR regression models, both models achieved R
3. Results scores of 0.999. Both models had MAE values of 0.004 and
RMSE values of 0.005. The RAE values for both models
3.1. Classification QSAR models
were 0.014. Regarding the training set results for the CSE-
Our study yielded the following results for classification- SMO-BF-SMO and CSE-SMO-GS-SMO QSAR regression
based machine learning models (Table 1). CFS-LR-BF models, we observed that both models achieved R scores
and CFS-LR-GS models exhibited precision scores of of 0.999. Both models achieved MAE values of 0.008 and
1.000, recall scores of 0.952, F1 scores of 0.976, and RMSE values of 0.010. The RAE values for both models
accuracy scores of 0.976. In addition, the CFS-NB-BF and were 0.032. We plotted the graphs of experimental pIC
50
CFS-NB-GS models had precision, recall, and F1 scores all versus predicted pIC of the compounds in the training
50
at 0.952 and accuracy scores at 0.953. The CSE-J48-LR-BF set (Figure 4). Consecutively, we evaluated the models on
model achieved a precision score of 0.955, a recall score of a test set to determine the predictive power of each model.
Table 1. Score for evaluation metric for the training set
CFS‑LR‑BF CFS‑LR‑GS CFS‑NB‑BF CFS‑NB‑GS CSE‑J48‑LR‑BF CSE‑J48‑IBK‑BF
Precision 1.000 1.000 0.952 0.952 0.955 0.952
Recall 0.952 0.952 0.952 0.952 1.000 0.952
F1 score 0.976 0.976 0.952 0.952 0.977 0.952
Accuracy 0.976 0.976 0.953 0.953 0.976 0.953
Abbreviations: BF: Best first; CFS: CfsSubsetEval; CSE: ClassifierSubsetEval; GS: Greedy stepwise; IBK: Instance-based learner; J48: J48 Decision Tree;
LR: Logistic regression; NB: Naïve Bayes.
Volume 2 Issue 1 (2025) 97 doi: 10.36922/aih.4375

