Page 122 - MI-2-3
P. 122
Microbes & Immunity Statistical modeling of COVID-19 trends
the independent variable. This analysis aimed to assess identifying principal components that captured the
whether a country’s economic development is associated largest proportion of variance in the input space. These
33
with its COVID-19 infection rate. Additionally, Pearson’s components were then used to predict the dependent
and Spearman’s correlation coefficients, along with the variable, regardless of their relevance to it. In contrast,
maximal information coefficient (MIC), were calculated PLS regression incorporated information from both the
to evaluate the strength of linear relationships. Pearson’s predictors and the response variable during component
27
correlation measured the strength of linear relationships, extraction, enabling it to select components most relevant
Spearman’s assessed the monotonic relationships, and MIC for predicting the outcome by maximizing the covariance
captured both linear and nonlinear associations. Detailed between predictors and response. 34
mathematical formulations for these analyses are provided The appropriate number of components for each
in the Supplementary File. method was determined using cross-validation procedures,
3.6.2. Multiple regression analysis with additional and model performance was assessed based on the mean
socioeconomic and health variables squared error of prediction (MSEP). Full mathematical
formulations and implementation details for both PCR
To further investigate the determinants of COVID-19 and PLS are provided in Supplementary File.
infection rates, a multiple regression model was employed,
incorporating additional variables such as the HDI, Gini 3.7. Spatial autocorrelation and hotspot analysis of
coefficient, health expenditure per capita, number of COVID-19 cases
hospital beds per 1,000 people, and population density. In this study, spatial analysis techniques were applied to
This analysis was used to evaluate the relative influence examine the distribution of COVID-19 infection rates
of various socioeconomic and healthcare-related factors across various regions. Moran’s I was calculated to assess
on COVID-19 infection rates across various countries. global spatial autocorrelation, and the Getis-Ord Gi*
28
Interaction terms were included to explore potential statistic was performed to identify local hotspots and
synergistic effects between variables. The detailed coldspots. The results were visualized using traditional
29
mathematical formulation of the expanded regression red-blue color schemes, effectively highlighting areas
model is provided in the Supplementary File. with significant spatial clustering of high or low infection
rates. 35,36
3.6.3. Addressing multicollinearity: Stepwise
regression, principal component regression (PCR), 3.7.1. Spatial autocorrelation: Moran’s I
and partial least squares (PLS)
Moran’s I is a widely used measure of global spatial
Given the potential for multicollinearity among autocorrelation that quantifies the degree of spatial
socioeconomic and healthcare-related predictors, clustering of a variable across geographical regions. It
37
several strategies were implemented to improve model identifies whether similar values (e.g., infection rates) tend
interpretability and estimation stability. First, stepwise to cluster spatially. A positive Moran’s I indicates that similar
regression was employed to refine the linear model by values clustered together, while a negative value indicates
iteratively adding or removing predictors based on their that dissimilar values are adjacent. For this analysis, a spatial
statistical significance. The selection process aimed to weights matrix was generated based on shared boundaries
minimize the AIC, balancing model fit with complexity. between geographic regions, and Moran’s I was calculated
30
To further evaluate multicollinearity, the variance inflation to assess the overall spatial autocorrelation of COVID-19
35
factor (VIF) was calculated for each predictor. Variables infection rates. The detailed mathematical formulation of
with VIF values exceeding 10 were considered to exhibit Moran’s I is provided in the Supplementary File.
significant multicollinearity, which can inflate the variance 3.7.2. Hotspot analysis: Getis-ord Gi* statistic
of coefficient estimates and reduce model reliability. 31
The Getis-Ord Gi* statistic is a local spatial statistic used to
To address multicollinearity more robustly, two-
dimensionality reduction techniques were applied: PCR identify geographic hotspots and coldspots, representing
areas with significant clustering of high or low values, such
and PLS regression. Both methods transformed the as COVID-19 infection rates. Hotspots indicate clusters
original set of correlated predictors into a smaller set of of high values, while coldspots indicate clusters of low
uncorrelated components, which were then used in place values. The significance of these clusters is determined
of the original variables in regression analysis. 32
by comparison with a reference distribution under the
The PCR analysis constructed components solely null hypothesis of spatial randomness. For this analysis,
38
based on the variance structure of the predictor variables, the Getis-Ord Gi* statistic was calculated using a spatial
Volume 2 Issue 3 (2025) 114 doi: 10.36922/MI025040007

