Page 122 - MI-2-3
P. 122

Microbes & Immunity                                                  Statistical modeling of COVID-19 trends



            the independent variable. This analysis  aimed to assess   identifying principal components that captured the
            whether a country’s economic development is associated   largest proportion of variance in the input space.  These
                                                                                                       33
            with its COVID-19 infection rate. Additionally, Pearson’s   components were then used to predict the dependent
            and Spearman’s correlation coefficients, along with the   variable, regardless of their relevance to it. In contrast,
            maximal information coefficient (MIC), were calculated   PLS  regression  incorporated information from  both  the
            to evaluate the strength of linear relationships.  Pearson’s   predictors and the response variable during component
                                                 27
            correlation measured the strength of linear relationships,   extraction, enabling it to select components most relevant
            Spearman’s assessed the monotonic relationships, and MIC   for predicting the outcome by maximizing the covariance
            captured both linear and nonlinear associations. Detailed   between predictors and response. 34
            mathematical formulations for these analyses are provided   The appropriate number of components for each
            in the Supplementary File.                         method was determined using cross-validation procedures,
            3.6.2. Multiple regression analysis with additional   and model performance was assessed based on the mean
            socioeconomic and health variables                 squared error of prediction (MSEP). Full mathematical
                                                               formulations and implementation details for both PCR
            To further investigate the determinants of COVID-19   and PLS are provided in Supplementary File.
            infection rates, a multiple regression model was employed,
            incorporating additional variables such as the HDI, Gini   3.7. Spatial autocorrelation and hotspot analysis of
            coefficient, health expenditure per capita, number of   COVID-19 cases
            hospital  beds  per  1,000  people,  and  population  density.   In this study, spatial analysis techniques were applied to
            This analysis was used to evaluate the relative influence   examine  the  distribution  of COVID-19  infection  rates
            of various socioeconomic and healthcare-related factors   across various regions. Moran’s I was calculated to assess
            on COVID-19 infection rates across various countries.    global spatial autocorrelation, and the Getis-Ord Gi*
                                                         28
            Interaction terms were included to explore potential   statistic  was  performed  to  identify  local  hotspots  and
            synergistic effects between variables.  The detailed   coldspots. The results were visualized using traditional
                                            29
            mathematical formulation of the expanded regression   red-blue  color  schemes,  effectively  highlighting  areas
            model is provided in the Supplementary File.       with significant spatial clustering of high or low infection
                                                               rates. 35,36
            3.6.3. Addressing multicollinearity: Stepwise
            regression, principal component regression (PCR),   3.7.1. Spatial autocorrelation: Moran’s I
            and partial least squares (PLS)
                                                               Moran’s  I  is  a  widely  used  measure  of  global  spatial
            Given  the  potential  for  multicollinearity  among   autocorrelation that quantifies the degree of spatial
            socioeconomic  and  healthcare-related  predictors,  clustering of a variable across geographical regions.  It
                                                                                                          37
            several strategies were implemented to improve model   identifies whether similar values (e.g., infection rates) tend
            interpretability and estimation stability. First, stepwise   to cluster spatially. A positive Moran’s I indicates that similar
            regression was employed to refine the linear model by   values clustered together, while a negative value indicates
            iteratively adding or removing predictors based on their   that dissimilar values are adjacent. For this analysis, a spatial
            statistical significance. The selection process aimed to   weights matrix was generated based on shared boundaries
            minimize the AIC, balancing model fit with complexity.    between geographic regions, and Moran’s I was calculated
                                                         30
            To further evaluate multicollinearity, the variance inflation   to assess the overall spatial autocorrelation of COVID-19
                                                                           35
            factor (VIF) was calculated for each predictor. Variables   infection rates.  The detailed mathematical formulation of
            with VIF values exceeding 10 were considered to exhibit   Moran’s I is provided in the Supplementary File.
            significant multicollinearity, which can inflate the variance   3.7.2. Hotspot analysis: Getis-ord Gi* statistic
            of coefficient estimates and reduce model reliability. 31
                                                               The Getis-Ord Gi* statistic is a local spatial statistic used to
              To  address  multicollinearity  more  robustly,  two-
            dimensionality reduction techniques were applied: PCR   identify geographic hotspots and coldspots, representing
                                                               areas with significant clustering of high or low values, such
            and PLS regression. Both methods transformed the   as  COVID-19  infection rates.  Hotspots  indicate  clusters
            original set of correlated predictors into a smaller set of   of high values, while coldspots indicate clusters of low
            uncorrelated components, which were then used in place   values. The significance of these clusters is determined
            of the original variables in regression analysis. 32
                                                               by comparison with a reference distribution under the
              The PCR analysis constructed components solely   null hypothesis of spatial randomness.  For this analysis,
                                                                                              38
            based on the variance structure of the predictor variables,   the Getis-Ord Gi* statistic was calculated using a spatial

            Volume 2 Issue 3 (2025)                        114                           doi: 10.36922/MI025040007
   117   118   119   120   121   122   123   124   125   126   127