Page 124 - ITPS-7-2
P. 124

INNOSC Theranostics and
            Pharmacological Sciences                                          PI3K-α inhibitors for cancer immunotherapy



              The coefficient of determination obtained by       PLS factor 4 had the lowest P-value (1.11E-34) among
            scrambling,  R  scramble (R Scr), is one of the statistical   all four PLS factors, and adding it to the model would
                                  2
                       2
            methods to test the significance of a 3D-QSAR model. It is   improve the model significantly, but not due to chance.
            the average value of R  from a series of models built using   Hence, the following trend is the case for the order of
                             2
            scrambled activities and measures the degree to which the   P-value significance among the PLS Factors:
            molecular fields can fit meaningless data. Table 2 shows the   P-value PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹
            values of R  scramble for the respective PLS factors, reveals   PLS factor 1
                    2
            that PLS factor 1 scored lowest with a value of 0.0499,
            while PLS factor 4 scored a value of 0.374. Moreover, a high   The remaining aspects of Table 2 were statistics typically
             2
            R  scramble indicates that the model is not meaningful and   associated with the test set of the input data used to build
            may be overfitting the data. Hence, the model containing   the  3D-QSAR  model. The  RMSE  is the  statistic that
            a PLS factor emerged as the most significant model with   describes how close the predicted values of the dependent
            meaningful data fitting. Concerning R Scr values for various   variable are to its actual values. A lower RMSE means that
                                         2
            PLS factors in Table 2, the following trend summarizes the   the predictions are more accurate and have fewer errors.
            order of meaningfulness of models as obtained by random   However, Table 2 revealed the lowest RMSE value for PLS
            shuffles of the values of the bioactivity response variable:  factor 4, while the RMSE value of 0.67, which represented
                                                               PLS factor 1, suggested a slightly weaker model prediction
              R Scr: PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹   with somewhat more errors than other PLS factors. The
                2
            PLS factor 1                                       trend for the order of significance is presented as follows:
              Table 2 also shows the value of stability of respective   RMSE: PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹
            PLS factors. Stability accounts for how stable the PLS   PLS factor 1
            factors are when different subsets of data are used to fit the
            model. This statistic ranges from 0 to 1, where 1 means that   The predictive squared correlation coefficient of the test
                                                                   2
            the factors are identical for all subsets (stable), and 0 means   set, Q , is a statistical property that reinforces the validity of
            that they are completely different. However, it is inferred   a QSAR model based on quantifying the predictive ability
            from  Table 2 that PLS factor 1 is more stable at model   of the model in the aspects concerning reliability, accuracy,
                                                                                                   2
            predictions to changes in the training set composition than   and applicability domain of the model. The Q  is obtained
            other PLS factors, as highlighted in the following trend:  from methods based on simple reuse, such as leave-one-
                                                               out and leave-many-out cross-validation.  This parameter
                                                                                               80
              Stability: PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2   is important and has become well-known because it takes
            ⟹ PLS factor 1                                     values in a normalized range (i.e., ≤1), thereby permitting
              F-value and  p-value are also statistical indices used   a trivial understanding of its values and easy comparison
            to validate the 3D-QSAR model.  F-value statistic tests   of different QSAR models and the different performance
            whether adding a new PLS factor to the model significantly   of fitting and predictive capabilities of a model. However, a
                                                                      2
            improves its fit or not, while p-value is a probability that   higher Q  means that the predictions are reliable and have less
            measures how likely it is to obtain  F-value as large or   uncertainty. Hence, in Table 2, it is revealed that the computed
                                                                                                 2
                                                                       2
            larger than the observed one by chance alone, assuming   value for Q  for PLS factor 4 exceeded the Q  values of other
            that adding a new PLS factor does not improve the fit of   PLS factors. This implied a more confident final result valid
            the model. F-value is computed by comparing the sum of   both for internal validation, such as cross-validation or
                                                                                              80
            square errors (SSE) of two nested modes: one with k PLS   bootstrap, as well as external validation  and the following
                                                                                           2
            factors and one with k+1 PLS factors. A  higher  F-value   trend depicts the reliability order of Q  on model:
            means  that adding a new  PLS factor  reduces the  SSE   Q : PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹ PLS
                                                                   2
            significantly and improves the fit of the model. Conversely,   factor 1
            a lower p-value means that adding a new PLS factor is more   The Pearson correlation coefficient, or Pearson-r
            significant and not due to chance. Table 2 suggested that   statistic, assesses the strength of the correlation between
            PLS factor 4 had a higher F-value (113.3) when compared   two continuous variables, ranging from 0 to 1. A  value
            with the other PLS factors, which means that its addition   of 1 indicates a stronger correlation, while a value of 0
            to the model would significantly improve its fit and reduce   represents a weaker correlation.  Table 2  illustrates the
            the SSE. The hierarchy of F-value significance is depicted   Pearson correlation coefficient, which estimates the degree
            below:
                                                               of correlation between the respective PLS factors and
              F-value PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹   the predicted activities of the model using the test set. It
            PLS factor 1                                       is therein in Table 2 that PLS factor 4, with a Pearson-r


            Volume 7 Issue 2 (2024)                         14                               doi: 10.36922/itps.2340
   119   120   121   122   123   124   125   126   127   128   129