Page 134 - MI-2-3
P. 134
Microbes & Immunity Statistical modeling of COVID-19 trends
5. Conclusion influencing the spread of COVID-19. Multicollinearity
causes instability in coefficient estimates, reducing the
The comprehensive statistical analysis of COVID- reliability of the model’s predictions. In some cases, this
31
19 trends—employing ARIMA, ARIMAX, multiple instability leads the ARIMAX model to perform worse
regression, and spatial autocorrelation models—provides than the simpler ARIMA model, which does not encounter
valuable insights into the dynamics of the pandemic both this complication.
globally and within the US. These findings highlight the
strengths and limitations of different modeling approaches Timing also plays a crucial role in the performance of
and the complexity of factors influencing COVID-19 case the ARIMAX model. The effects of vaccination on COVID-
45
numbers. 19 cases often involve variable and unpredictable lags. If
the model fails to capture the appropriate lag structure,
The ARIMA models demonstrate robust performance
in predicting short-term COVID-19 trends, particularly it could lead to inaccurate predictions. For example, the
time required for immunity to develop post-vaccination or
when case dynamics follow relatively stable patterns. differences in response across population groups can cause
14
However, the models show limitations when sudden mismatches between vaccination data and observed case
changes occur in infection rates, such as those caused changes, further complicating the accuracy of ARIMAX
by sudden policy shifts or the emergence of new virus predictions.
variants. These situations often reduce predictive
43
accuracy, suggesting that while ARIMA models effectively Additionally, the ARIMAX model carries a risk of
capture general trends, they may require augmentation overfitting, especially when it becomes overly complex in
or combination with other models to better account for relation to the available data. Overfitting occurs when the
sudden, non-linear changes. 44 model captures noise or random fluctuations in the training
data as meaningful patterns, reducing its predictive accuracy
The ARIMAX models, which incorporate exogenous 32
variables such as vaccination data, provide a more on new data. This issue becomes more pronounced when
vaccination data are included, as the added complexity
nuanced analysis by accounting for external influences could reduce the model’s generalizability.
on COVID-19 case numbers. However, the effectiveness
22
of the ARIMAX model depends heavily on the specific In the multiple regression analysis, several
characteristics of the time period and the data. For socioeconomic factors emerge as significant predictors of
instance, during periods when the impact of vaccination COVID-19 case numbers. For example, previous research
on case numbers is delayed or less pronounced, the model indicates that factors such as population density, median
struggles to accurately capture the true relationship income, and access to healthcare services demonstrate
between variables. This is particularly evident when strong correlations with case numbers. These findings
48
45
vaccine uptake is gradual or when vaccination effects take highlight the unequal impact of the pandemic across
time to appear in the population. Under these conditions, various demographic groups and regions. Specifically,
the model may overestimate or underestimate the influence areas with higher population density and lower income
of vaccination, leading to skewed forecasts. 46 levels tend to report higher case numbers, likely due to the
challenges in practicing social distancing and the limited
Several challenges arise in applying the ARIMAX 8
model. Firstly, the model assumes a direct and linear access to healthcare services.
effect of the exogenous variable (vaccination rates) on the The regression analysis further emphasizes the
dependent variable (COVID-19 cases), which may not fully importance of incorporating a broad range of socioeconomic
capture the complex, non-linear relationships involved. factors when assessing the spread of COVID-19. However,
14
Factors such as varying vaccine efficacy, the emergence the model also reveals certain limitations. The relationships
of new virus variants, shifts in public behavior, and policy between the independent variables and COVID-19 case
interventions (e.g., lockdowns, mask mandates) influence numbers are not always linear, suggesting the need for
the effectiveness of vaccination efforts in reducing case more advanced modeling approaches that can capture
numbers. If these factors are not properly incorporated, these complexities. Moreover, the presence of interaction
47
29
the ARIMAX model may incorrectly attribute changes effects among the variables, such as the combined impact of
in case numbers to vaccination, leading to inaccurate income and healthcare access, suggests that future models
predictions. should explore these interactions to better understand the
pandemic’s dynamics.
Moreover, including vaccination data as an exogenous
variable introduces the risk of multicollinearity, particularly Spatial autocorrelation analyses provide additional
if the vaccination rates correlate with other factors insights, particularly regarding the geographic clustering
Volume 2 Issue 3 (2025) 126 doi: 10.36922/MI025040007

