Page 132 - MI-2-3
P. 132

Microbes & Immunity                                                  Statistical modeling of COVID-19 trends



            employed. These techniques are specifically designed to
            mitigate multicollinearity by transforming the predictor
            variables into a set of uncorrelated components.
              The PLS and PCR models were applied to the dataset,
            with each method aiming to reduce the dimensionality of
            the predictor variables while maximizing the explained
            variance in the response variable—namely, the infection
            rate. In particular, the PLS analysis—as shown in Table 7—
            is  effective,  demonstrating  67.22%  of  the  variance  in
            infection rates using five components—identified as the
            optimal number of components through cross-validation
            (Figure 9). Beyond these components, the model’s MSEP
            begins to increase, suggesting that additional components   Figure 9. Cross-validation mean squared error plot for the partial least
            may introduce noise rather than improve predictive   squares model analysis
            accuracy.                                          Abbreviation: MSEP: Mean squared error of prediction.
              In addition to the error plot,  Table S9 provides   Table 7. The partial least squares analysis results showing
            detailed cross-validation results for each model. This table   variance explained by the number of components
            presents the MSEP for different numbers of components,
            highlighting how the error decreases as the number of   Number of      Predictor         Infection
            components increases up to five and then rises with the   components  variables (%)      rates (%)
            inclusion of additional components. These results support   1            47.98             48.67
            the findings illustrated in  Figure  9, which identify five   2          66.98             55.79
            components as optimal.                             3                     81.77             59.75
              The component loadings from the PLS model, as    4                     88.76             65.34
            illustrated in the heatmap (Figure S9), highlight the   5                94.81             67.22
            contribution of each variable to the principal components.   6           97.28             70.14
            Predictors such as GDP per capita, HDI, and health   7                   98.05             72.68
            expenditure demonstrate significant loadings on the first   8            98.92             73.76
            few components, indicating their strong influence on the
            model. Additionally, more complex interactions—such as   9               99.48             73.93
            those between GDP per capita and population density or   10              99.78             74.10
            health expenditure and population density—play critical   11             99.85             74.56
            roles in the later components.                     12                    99.90             75.47
              In addition to the heatmap, detailed PLS loadings are   13             99.95             75.98
            provided in Table S10. This table lists the specific loading   14        99.97             76.27
            values for each variable across the first five components,   15          99.98             76.49
            further illustrating the contributions and interactions   16             99.99             76.84
            among variables in shaping the principal components.  17                 99.99             77.45
              By  reducing  the  predictors  into  principal   18                   100.00             77.79
            components, the PLS model provides a more stable set   19               100.00             78.81
            of coefficients, as shown by the reduced VIF values and   20            100.00             81.62
            improved interpretability of the regression coefficients.   21          100.00             81.79
            The final regression coefficients obtained from the PLS
            model (Table S11) reveal both the direct and interaction
            effects of the predictor variables on infection rates,   reduces multicollinearity, this comes at the cost of reduced
            offering a clearer insight into the complex underlying   predictive power compared to PLS. Specifically, PCR
            relationships.                                     explains 59.43% of the variance in infection rates using six
              In contrast, PCR (Table S8) demonstrates similar   components, increasing  to 75.81%  with  17  components,
            results but with slightly lower explained variance for the   but it does not exceed the overall performance of the PLS
            same number of components. While PCR effectively   model.


            Volume 2 Issue 3 (2025)                        124                           doi: 10.36922/MI025040007
   127   128   129   130   131   132   133   134   135   136   137