Page 120 - MI-2-3
P. 120

Microbes & Immunity                                                  Statistical modeling of COVID-19 trends




                                                                                                         20
                       1  n      2                             accounting for typical patterns in the time series.  In
                               x ˘
               RMSE          x                    (II)    this study, the residuals were examined, and points were
                       n  i 1  i  i                           classified as outliers if their deviation from the local mean
                                                               exceeded a certain threshold.
              where x  is the actual value, and xˆ  is the predicted value.
                     i
                                         i
            Lower RMSE values indicate better model performance. 16  The  theoretical basis  for  this  method  involves
                                                               identifying points that significantly deviate from the local
              A  grid  search  was conducted  across  various
            combinations of the p and q parameters, with the d-order   mean or expected value of the time series. Mathematically,
                                                               a data point  x   is considered an outlier if it satisfies the
            fixed at 1. This process was parallelized to efficiently explore   condition:  t
            the parameter space.  RMSE was used as the evaluation
                             14
            metric due to its effectiveness in quantifying average   |x µ| > k×σ                          (III)
                                                                   t−
            prediction errors while placing greater emphasis on larger   where:
            errors.  This characteristic makes RMSE particularly   (i)  µ represents the local mean,
                 17
            useful in contexts where significant forecasting errors   (ii)  σ  is  the  standard  deviation  of  the  surrounding  data
            could lead to significant consequences, as it penalizes large   points,
            discrepancies more heavily than other metrics, such as   (iii) k is a threshold factor that determines the sensitivity
            mean absolute error (MAE). Additionally, since RMSE is   of the detection. 21
            measured in the same units as the original data, it provides
            results that are interpretable in practical applications.  Typically, k is set to values such as 2 or 3, corresponding
                                                               to confidence intervals commonly used in outlier
              To compare the performance between manual and    detection. 20
            automated model selections, the auto.arima function
            was employed. This function automatically identifies the   This method is particularly effective for detecting
            optimal ARIMA model by optimizing information criteria   additive outliers, which appear as sudden spikes or drops
            such as the AIC or BIC.  While auto.arima provides a   in the time series—events that may result from external
                                 18
            rapid and efficient global fit over the entire dataset, rolling   shocks such as the emergence of a new COVID-19
                                                                     14
            window cross-validation offers a more robust evaluation   variant.  Identifying and analyzing these outliers provides
            by assessing the model’s predictive performance across   valuable insights into how unexpected events influence
            different time periods.  This approach enabled a detailed   overall trends, enabling more informed adjustments to
                              19
            comparison of the consistency and reliability of automated   forecasting models.
            versus manually selected models.                     The detected anomalies were then visualized in

              By visualizing the RMSE values across various    a time series plot, highlighting points of significant
            parameter combinations, the best-performing model   deviation to facilitate further investigation and model
            identified through rolling window cross-validation was   adjustments. 21
            compared with the model selected by auto.arima. This   3.4. Theoretical basis of the ARIMAX model
            comparison provided insights into the trade-offs between
            automated selection and manual tuning in ARIMA-based   To improve the accuracy of time series forecasting, the
            time series forecasting.                           ARIMAX model was employed, integrating external factors
                                                               into the standard ARIMA model. This extension allows the
            3.3. Anomaly detection                             model to account for factors beyond the inherent patterns
                                                                                   14
            Anomaly detection in time series data is crucial for   in the target time series.  In this study, vaccination rates
            identifying irregular patterns, such as sudden spikes   were included as an exogenous variable to determine
            in COVID-19  case numbers. In this study, a statistical   whether they would improve forecast accuracy compared
            approach was employed to detect anomalies directly from   to the ARIMA model, which relied solely on historical
            the time series data without fitting a complex model like   time series data.
            ARIMA. This method, known as residual-based anomaly   The ARIMAX model expanded upon the ARIMA
            detection, identifies outliers based on their deviation from   framework by introducing exogenous regressors—external
            expected behavior within the data. 14              variables believed to influence the dependent variable.
              The anomaly detection approach relies on statistical   Mathematically, the ARIMAX model is expressed as:
            rules  that  identify  observations  as  anomalies  when  they   p    q           r

            significantly deviate from surrounding values. Specifically,   y  0     i y t i      jt j    t   k X t k  (IV)



                                                                   t
            outliers are detected by analyzing the residuals after       i1      j1        k1
            Volume 2 Issue 3 (2025)                        112                           doi: 10.36922/MI025040007
   115   116   117   118   119   120   121   122   123   124   125