Page 120 - MI-2-3
P. 120
Microbes & Immunity Statistical modeling of COVID-19 trends
20
1 n 2 accounting for typical patterns in the time series. In
x ˘
RMSE x (II) this study, the residuals were examined, and points were
n i 1 i i classified as outliers if their deviation from the local mean
exceeded a certain threshold.
where x is the actual value, and xˆ is the predicted value.
i
i
Lower RMSE values indicate better model performance. 16 The theoretical basis for this method involves
identifying points that significantly deviate from the local
A grid search was conducted across various
combinations of the p and q parameters, with the d-order mean or expected value of the time series. Mathematically,
a data point x is considered an outlier if it satisfies the
fixed at 1. This process was parallelized to efficiently explore condition: t
the parameter space. RMSE was used as the evaluation
14
metric due to its effectiveness in quantifying average |x µ| > k×σ (III)
t−
prediction errors while placing greater emphasis on larger where:
errors. This characteristic makes RMSE particularly (i) µ represents the local mean,
17
useful in contexts where significant forecasting errors (ii) σ is the standard deviation of the surrounding data
could lead to significant consequences, as it penalizes large points,
discrepancies more heavily than other metrics, such as (iii) k is a threshold factor that determines the sensitivity
mean absolute error (MAE). Additionally, since RMSE is of the detection. 21
measured in the same units as the original data, it provides
results that are interpretable in practical applications. Typically, k is set to values such as 2 or 3, corresponding
to confidence intervals commonly used in outlier
To compare the performance between manual and detection. 20
automated model selections, the auto.arima function
was employed. This function automatically identifies the This method is particularly effective for detecting
optimal ARIMA model by optimizing information criteria additive outliers, which appear as sudden spikes or drops
such as the AIC or BIC. While auto.arima provides a in the time series—events that may result from external
18
rapid and efficient global fit over the entire dataset, rolling shocks such as the emergence of a new COVID-19
14
window cross-validation offers a more robust evaluation variant. Identifying and analyzing these outliers provides
by assessing the model’s predictive performance across valuable insights into how unexpected events influence
different time periods. This approach enabled a detailed overall trends, enabling more informed adjustments to
19
comparison of the consistency and reliability of automated forecasting models.
versus manually selected models. The detected anomalies were then visualized in
By visualizing the RMSE values across various a time series plot, highlighting points of significant
parameter combinations, the best-performing model deviation to facilitate further investigation and model
identified through rolling window cross-validation was adjustments. 21
compared with the model selected by auto.arima. This 3.4. Theoretical basis of the ARIMAX model
comparison provided insights into the trade-offs between
automated selection and manual tuning in ARIMA-based To improve the accuracy of time series forecasting, the
time series forecasting. ARIMAX model was employed, integrating external factors
into the standard ARIMA model. This extension allows the
3.3. Anomaly detection model to account for factors beyond the inherent patterns
14
Anomaly detection in time series data is crucial for in the target time series. In this study, vaccination rates
identifying irregular patterns, such as sudden spikes were included as an exogenous variable to determine
in COVID-19 case numbers. In this study, a statistical whether they would improve forecast accuracy compared
approach was employed to detect anomalies directly from to the ARIMA model, which relied solely on historical
the time series data without fitting a complex model like time series data.
ARIMA. This method, known as residual-based anomaly The ARIMAX model expanded upon the ARIMA
detection, identifies outliers based on their deviation from framework by introducing exogenous regressors—external
expected behavior within the data. 14 variables believed to influence the dependent variable.
The anomaly detection approach relies on statistical Mathematically, the ARIMAX model is expressed as:
rules that identify observations as anomalies when they p q r
significantly deviate from surrounding values. Specifically, y 0 i y t i jt j t k X t k (IV)
t
outliers are detected by analyzing the residuals after i1 j1 k1
Volume 2 Issue 3 (2025) 112 doi: 10.36922/MI025040007

