Page 104 - AIH-2-1
P. 104

Artificial Intelligence in Health                                     EBNA1 inhibitors against EBV in NPC




                         A                       B                        C












                         D                       E                        F













            Figure 2. Confusion matrix for the training set results of (A) CFS-LR-BF, (B) CFS-LR-GF, (C) CSE-J48-BF-LR, (D) CFS-NB-BF, (E) CFS-NB-GS, and (F)
            CSE-J48-BF-IBK
            Abbreviations: BF: Best first; CFS: CfsSubsetEval; CSE: ClassifierSubsetEval; GS: Greedy stepwise; IBK: Instance-based learner; J48: J48 Decision Tree; LR:
            Logistic regression; NB: Naïve Bayes.

            Table 2. Score for evaluation metric for the test set
                        CFS‑LR‑BF     CFS‑LR‑GS     CFS‑NB‑BF     CFS‑NB‑GS     CSE‑J48‑LR‑BF    CSE‑J48‑IBK‑BF
            Precision     1.000         1.000         1.000          1.000          1.000            1.000
            Recall        0.571         0.571         0.429          0.429          0.429            0.429
            F1 score      0.727         0.727         0.600          0.600          0.600            0.600
            Accuracy      0.667         0.667         0.556          0.556          0.556            0.556
            Abbreviations: BF: Best first; CFS: CfsSubsetEval; CSE: ClassifierSubsetEval; GS: Greedy stepwise; IBK: Instance-based learner; J48: J48 Decision Tree;
            LR: Logistic regression; NB: Naïve Bayes.

              For our external test set results, we observed that the   3.3. Deployment of model
            CSE-LRE-BF-SMO and  CSE-LRE-GS-SMO  achieved  R    Given that our target variable is the pIC  of compounds,
            scores of 0.703 and 0.705, respectively. The MAE and RMSE   we decided to employ a modeling approach that provides
                                                                                                50
            values for both models were 0.173 and 0.217, respectively.   numerical outcomes, namely the regression algorithm.
            Meanwhile, the RAE values for both models were 0.688   Therefore, we chose to deploy the CSE-SMO-BF-LRE model
            and 0.686, respectively. Both the CSE-SMO-BF-LRE and   on the enamine advanced library to predict their inhibitory
            CSE-SMO-GS-LRE QSAR regression models achieved an   activities  against  EBNA1. After the enamine advanced
            R score of 0.703 in the test set. The MAE and RMSE values   library compounds were featured with chemical fingerprints,
            were 0.173 and 0.217, respectively. The RAE values for   we predicted their pIC  against EBNA1 using the chosen
            both models were 0.689. Moving on to the CSE-SMO-BF-  regression model. The structures of the top 10 compounds
                                                                                 50
            SMO and CSE-SMO-GS-SMO QSAR regression models,     are shown in Figure 6. Future work would involve purchasing
            both models achieved an R score of 0.703 in the test set.   these ten compounds for experimental validation.
            The MAE values for both models were 0.173 whereas the
            RMSE values for both models were 0.217. The RAE values   4. Discussion
            for both models were 0.689. The outcomes of the test set
            evaluation are depicted through a table summarizing the   4.1. Classification QSAR models
            different evaluation metrics (Table 3) and plots of actual   We assessed our classification QSAR models’ performance
            pIC  versus predicted pIC  (Figure 5).             using four key metrics: Precision, recall, F1 score, and
               50
                                 50
            Volume 2 Issue 1 (2025)                         98                               doi: 10.36922/aih.4375
   99   100   101   102   103   104   105   106   107   108   109