Page 119 - MI-2-3

P. 119

Microbes & Immunity Statistical modeling of COVID-19 trends

The ARIMA modeling process began with a stationarity Figure 1 summarizes this process, illustrating the
test, commonly conducted using the augmented Dickey– sequence of steps from stationarity assessment to
11
Fuller test. If the time series is found to be non-stationary, forecasting.
transformations such as differencing or logarithmic scaling
are typically applied to achieve stationarity. Model 3.2. Rolling window cross-validation and
10
identification involved determining the orders of the model, comparison with auto.arima
specifically the values of p and q, which represent the AR In this study, rolling window cross-validation was used
and MA terms, respectively. This step is usually performed to evaluate the performance of ARIMA models for time
by analyzing the autocorrelation function (ACF) and partial series forecasting. The primary goal was to identify the
autocorrelation function (PACF) plots. The differencing optimal ARIMA model parameters by minimizing the root
9
order (d) was determined based on the transformations mean squared error (RMSE) and to compare the results
applied during the stationarity testing phase. with those obtained from the automated model selection
Following model identification, the parameters ϕ and function, auto.arima. 14
i
θ were estimated, typically using maximum likelihood Rolling window cross-validation is a method
j
estimation. Model validation was then performed using specifically designed for time series data as it preserves
10
statistical tests such as the Ljung-Box test to ensure that the the temporal order of the data. In each iteration, the
residuals exhibited white noise behavior, indicating that model was trained on a fixed-length window of historical
the model adequately captured the time series structure. data and validated on the subsequent observation.
12
Model selection was based on information criteria such This approach ensures that the evaluation reflects real-
as the Akaike information criterion (AIC) or Bayesian world forecasting conditions, where future values must
information criterion (BIC), with preference given to be predicted using only past data. For each ARIMA
15
13
the model with the lowest criterion value. Finally, once model evaluated, the one-step-ahead forecast errors were
validated, the model was employed to forecast future calculated, and RMSE was used as the primary evaluation
values of the time series. 9 metric. RMSE is given:

Figure 1. The autoregressive integrated moving average model construction flow chart
Abbreviations: ACF: Autocorrelation function; ARIMA: Autoregressive integrated moving average; PACF: Partial autocorrelation function.

Volume 2 Issue 3 (2025) 111 doi: 10.36922/MI025040007

114 115 116 117 118 119 120 121 122 123 124