Page 103 - AIH-2-1
P. 103

Artificial Intelligence in Health                                     EBNA1 inhibitors against EBV in NPC



              wheren represents the number of predictions;  y   i  1.000, an F1 score of 0.977, and an accuracy score of 0.976.
                                            ˆ
            represents the observed values; and  y  represents the   Meanwhile,  the  CSE-J48-IBK-BF  model  demonstrated  a
                                             i
            predicted values.                                  precision score of 0.952, a recall score of 0.952, an F1 score
              RAE serves as a measure to assess the performance of a   of 0.952, and an accuracy score of 0.953. We visualized the
            predictive model and is represented as a ratio. Lower RAE   performance  of  these  models  using  a  confusion  matrix
            scores indicate a more effective mode.  The equation for   (Figure 2).
                                           44
            calculating RAE is as follows.                       We evaluated our models using an external test set
                                                               comprising eight compounds (Table 2). The CFS-LR-BF
                    n
                  ∑   y −  ˆ y                                 and CFS-LR-GS QSAR classification models demonstrated
            RAE =   i1=  i  i                        (VIII)
                  ∑  n i1  y − y                               precision scores of 1.000 and recall scores of 0.571. Both
                        i
                     =
                                                               models achieved F1 scores of 0.727 and accuracy scores of
              where                                            0.667. The CFS-NB-BF and CFS-NB-GS models exhibited
                                                               precision scores of 1.000 and recall scores of 0.429. Both
               1  n                                            models achieved F1 scores of 0.600 and accuracy scores of
            y    y                                   (IX)
               n  i1  i                                       0.556. Finally, the CSE-J48-LR-BF and CSE-J48-IBK-BF
                                                               models demonstrated precision scores of 1.000, with
              where  n represents the number of observations; y   i  recall scores of 0.429. Both models achieved F1 scores of
            represents the observed value; and  y  represents the   0.600 and accuracy scores of 0.556. We also presented the
            average of observed values.                        results of the test set evaluation using a series of confusion
              These four assessment regression metrics offer a   matrices (Figure 3). These visual representations show the
            thorough perspective on the performance of regression   models’ performance in classifying active and inactive
            QSAR models.                                       compounds.
            2.5. Model deployment                              3.2. Regression QSAR models
            After constructing the QSAR models, we validated all   For regression-based models, we obtained the following
            our models using the external test set. The chosen model   results. For the training set of CSE-LRE-BF-SMO and CSE-
            was then deployed on the enamine advanced compound   LRE-GS-SMO, both models achieved R scores of 0.992.
            library.  The  enamine  advanced  compound  library  was   Both models had MAE values of 0.029 and RMSE values
            similarly featurized with chemical fingerprints using the   of 0.037. The RAE values for both models were 0.118. For
            PaDEL-Descriptor package.                          the training set of the CSE-SMO-BF-LRE and CSE-SMO-
                                                               GS-LRE QSAR regression models, both models achieved R
            3. Results                                         scores of 0.999. Both models had MAE values of 0.004 and
                                                               RMSE values of 0.005. The RAE values for both models
            3.1. Classification QSAR models
                                                               were 0.014. Regarding the training set results for the CSE-
            Our study yielded the following results for classification-  SMO-BF-SMO and CSE-SMO-GS-SMO QSAR regression
            based machine learning models (Table 1). CFS-LR-BF   models, we observed that both models achieved R scores
            and CFS-LR-GS models exhibited precision scores of   of 0.999. Both models achieved MAE values of 0.008 and
            1.000, recall scores of 0.952, F1 scores of 0.976, and   RMSE values of 0.010. The RAE values for both models
            accuracy scores of 0.976. In addition, the CFS-NB-BF and   were 0.032. We plotted the graphs of experimental pIC
                                                                                                            50
            CFS-NB-GS models had precision, recall, and F1 scores all   versus predicted pIC  of the compounds in the training
                                                                                50
            at 0.952 and accuracy scores at 0.953. The CSE-J48-LR-BF   set (Figure 4). Consecutively, we evaluated the models on
            model achieved a precision score of 0.955, a recall score of   a test set to determine the predictive power of each model.

            Table 1. Score for evaluation metric for the training set
                        CFS‑LR‑BF     CFS‑LR‑GS     CFS‑NB‑BF     CFS‑NB‑GS      CSE‑J48‑LR‑BF   CSE‑J48‑IBK‑BF
            Precision     1.000         1.000         0.952          0.952          0.955            0.952
            Recall        0.952         0.952         0.952          0.952          1.000            0.952
            F1 score      0.976         0.976         0.952          0.952          0.977            0.952
            Accuracy      0.976         0.976         0.953          0.953          0.976            0.953
            Abbreviations: BF: Best first; CFS: CfsSubsetEval; CSE: ClassifierSubsetEval; GS: Greedy stepwise; IBK: Instance-based learner; J48: J48 Decision Tree;
            LR: Logistic regression; NB: Naïve Bayes.


            Volume 2 Issue 1 (2025)                         97                               doi: 10.36922/aih.4375
   98   99   100   101   102   103   104   105   106   107   108