Page 132 - MI-2-3
P. 132
Microbes & Immunity Statistical modeling of COVID-19 trends
employed. These techniques are specifically designed to
mitigate multicollinearity by transforming the predictor
variables into a set of uncorrelated components.
The PLS and PCR models were applied to the dataset,
with each method aiming to reduce the dimensionality of
the predictor variables while maximizing the explained
variance in the response variable—namely, the infection
rate. In particular, the PLS analysis—as shown in Table 7—
is effective, demonstrating 67.22% of the variance in
infection rates using five components—identified as the
optimal number of components through cross-validation
(Figure 9). Beyond these components, the model’s MSEP
begins to increase, suggesting that additional components Figure 9. Cross-validation mean squared error plot for the partial least
may introduce noise rather than improve predictive squares model analysis
accuracy. Abbreviation: MSEP: Mean squared error of prediction.
In addition to the error plot, Table S9 provides Table 7. The partial least squares analysis results showing
detailed cross-validation results for each model. This table variance explained by the number of components
presents the MSEP for different numbers of components,
highlighting how the error decreases as the number of Number of Predictor Infection
components increases up to five and then rises with the components variables (%) rates (%)
inclusion of additional components. These results support 1 47.98 48.67
the findings illustrated in Figure 9, which identify five 2 66.98 55.79
components as optimal. 3 81.77 59.75
The component loadings from the PLS model, as 4 88.76 65.34
illustrated in the heatmap (Figure S9), highlight the 5 94.81 67.22
contribution of each variable to the principal components. 6 97.28 70.14
Predictors such as GDP per capita, HDI, and health 7 98.05 72.68
expenditure demonstrate significant loadings on the first 8 98.92 73.76
few components, indicating their strong influence on the
model. Additionally, more complex interactions—such as 9 99.48 73.93
those between GDP per capita and population density or 10 99.78 74.10
health expenditure and population density—play critical 11 99.85 74.56
roles in the later components. 12 99.90 75.47
In addition to the heatmap, detailed PLS loadings are 13 99.95 75.98
provided in Table S10. This table lists the specific loading 14 99.97 76.27
values for each variable across the first five components, 15 99.98 76.49
further illustrating the contributions and interactions 16 99.99 76.84
among variables in shaping the principal components. 17 99.99 77.45
By reducing the predictors into principal 18 100.00 77.79
components, the PLS model provides a more stable set 19 100.00 78.81
of coefficients, as shown by the reduced VIF values and 20 100.00 81.62
improved interpretability of the regression coefficients. 21 100.00 81.79
The final regression coefficients obtained from the PLS
model (Table S11) reveal both the direct and interaction
effects of the predictor variables on infection rates, reduces multicollinearity, this comes at the cost of reduced
offering a clearer insight into the complex underlying predictive power compared to PLS. Specifically, PCR
relationships. explains 59.43% of the variance in infection rates using six
In contrast, PCR (Table S8) demonstrates similar components, increasing to 75.81% with 17 components,
results but with slightly lower explained variance for the but it does not exceed the overall performance of the PLS
same number of components. While PCR effectively model.
Volume 2 Issue 3 (2025) 124 doi: 10.36922/MI025040007

