Page 124 - ITPS-7-2
P. 124
INNOSC Theranostics and
Pharmacological Sciences PI3K-α inhibitors for cancer immunotherapy
The coefficient of determination obtained by PLS factor 4 had the lowest P-value (1.11E-34) among
scrambling, R scramble (R Scr), is one of the statistical all four PLS factors, and adding it to the model would
2
2
methods to test the significance of a 3D-QSAR model. It is improve the model significantly, but not due to chance.
the average value of R from a series of models built using Hence, the following trend is the case for the order of
2
scrambled activities and measures the degree to which the P-value significance among the PLS Factors:
molecular fields can fit meaningless data. Table 2 shows the P-value PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹
values of R scramble for the respective PLS factors, reveals PLS factor 1
2
that PLS factor 1 scored lowest with a value of 0.0499,
while PLS factor 4 scored a value of 0.374. Moreover, a high The remaining aspects of Table 2 were statistics typically
2
R scramble indicates that the model is not meaningful and associated with the test set of the input data used to build
may be overfitting the data. Hence, the model containing the 3D-QSAR model. The RMSE is the statistic that
a PLS factor emerged as the most significant model with describes how close the predicted values of the dependent
meaningful data fitting. Concerning R Scr values for various variable are to its actual values. A lower RMSE means that
2
PLS factors in Table 2, the following trend summarizes the the predictions are more accurate and have fewer errors.
order of meaningfulness of models as obtained by random However, Table 2 revealed the lowest RMSE value for PLS
shuffles of the values of the bioactivity response variable: factor 4, while the RMSE value of 0.67, which represented
PLS factor 1, suggested a slightly weaker model prediction
R Scr: PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹ with somewhat more errors than other PLS factors. The
2
PLS factor 1 trend for the order of significance is presented as follows:
Table 2 also shows the value of stability of respective RMSE: PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹
PLS factors. Stability accounts for how stable the PLS PLS factor 1
factors are when different subsets of data are used to fit the
model. This statistic ranges from 0 to 1, where 1 means that The predictive squared correlation coefficient of the test
2
the factors are identical for all subsets (stable), and 0 means set, Q , is a statistical property that reinforces the validity of
that they are completely different. However, it is inferred a QSAR model based on quantifying the predictive ability
from Table 2 that PLS factor 1 is more stable at model of the model in the aspects concerning reliability, accuracy,
2
predictions to changes in the training set composition than and applicability domain of the model. The Q is obtained
other PLS factors, as highlighted in the following trend: from methods based on simple reuse, such as leave-one-
out and leave-many-out cross-validation. This parameter
80
Stability: PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 is important and has become well-known because it takes
⟹ PLS factor 1 values in a normalized range (i.e., ≤1), thereby permitting
F-value and p-value are also statistical indices used a trivial understanding of its values and easy comparison
to validate the 3D-QSAR model. F-value statistic tests of different QSAR models and the different performance
whether adding a new PLS factor to the model significantly of fitting and predictive capabilities of a model. However, a
2
improves its fit or not, while p-value is a probability that higher Q means that the predictions are reliable and have less
measures how likely it is to obtain F-value as large or uncertainty. Hence, in Table 2, it is revealed that the computed
2
2
larger than the observed one by chance alone, assuming value for Q for PLS factor 4 exceeded the Q values of other
that adding a new PLS factor does not improve the fit of PLS factors. This implied a more confident final result valid
the model. F-value is computed by comparing the sum of both for internal validation, such as cross-validation or
80
square errors (SSE) of two nested modes: one with k PLS bootstrap, as well as external validation and the following
2
factors and one with k+1 PLS factors. A higher F-value trend depicts the reliability order of Q on model:
means that adding a new PLS factor reduces the SSE Q : PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹ PLS
2
significantly and improves the fit of the model. Conversely, factor 1
a lower p-value means that adding a new PLS factor is more The Pearson correlation coefficient, or Pearson-r
significant and not due to chance. Table 2 suggested that statistic, assesses the strength of the correlation between
PLS factor 4 had a higher F-value (113.3) when compared two continuous variables, ranging from 0 to 1. A value
with the other PLS factors, which means that its addition of 1 indicates a stronger correlation, while a value of 0
to the model would significantly improve its fit and reduce represents a weaker correlation. Table 2 illustrates the
the SSE. The hierarchy of F-value significance is depicted Pearson correlation coefficient, which estimates the degree
below:
of correlation between the respective PLS factors and
F-value PLS factor 4 ⟹ PLS factor 3 ⟹ PLS factor 2 ⟹ the predicted activities of the model using the test set. It
PLS factor 1 is therein in Table 2 that PLS factor 4, with a Pearson-r
Volume 7 Issue 2 (2024) 14 doi: 10.36922/itps.2340

