Page 126 - MI-2-3
P. 126

Microbes & Immunity                                                  Statistical modeling of COVID-19 trends



            4.3. Rolling window cross-validation and           Table 3. Comparison of RMSE values for ARIMA models
            comparison with auto.arima                         with parameters selected by auto.arima and cross‑validation
                                                               for COVID‑19 case data in the United States
            In the previous ARIMA forecasting efforts, the auto.
            arima function was used to automatically select the model   Model              ARIMA       RMSE
            parameters p, d, and q. This function optimizes the model                    parameters
            by minimizing the AIC, which balances the model fit                         p   d    q
            and complexity by penalizing excessive parameters. This   auto.arima        1   2    0    27,648.12
            approach offers several advantages—including speed,   Cross-validation-based ARIMA  2  2  2  22,949.3
            automation, and generally reliable results. However, relying   Abbreviations: ARIMA: Autoregressive integrated moving average;
            solely on AIC may not always produce the most accurate   RMSE: Root mean squared error.
            forecasts, especially when working with nonstationary
            time series or for long-term predictions.          parameters selected by auto.arima and cross-validation.
              To explore whether other parameter selection methods   While both models exhibit significant deviations from
            could improve forecast accuracy, a rolling window cross-  the actual observed data due to the sudden surge in cases,
            validation technique was applied to optimize the  p  and   the  cross-validated model’s  predictions are  more closely
            q  parameters, while the  d  parameter remains fixed as   aligned with the actual observed data than those of auto.
            determined by the auto.arima function. The differencing   arima. This suggests that the cross-validation approach can
            order  d  is fixed because it addresses the time series’   improve forecast accuracy under certain conditions.
            stationarity by removing trends or seasonality—a concept   A similar approach was employed in the European
            well-supported by statistical theory. For example, once a   ARIMA model.  Table S2 presents the RMSE values
            time series is made stationary through differencing, the   comparing parameters selected by cross-validation
            order of d generally remains unchanged to maintain that   and  auto.arima,  while  the  RMSE  heatmap  in
            stationarity, even as p and q are adjusted.        Figure S4A visualizes the model performance across
              In this analysis, the period where ARIMA predictions   different combinations of p and q.
            significantly diverged from the actual observed data—such   Figure S4B compares the forecasted COVID-19 cases in
            as in the US and Europe from January 5 to December 27,   Europe using ARIMA models with parameters selected by
            2020—was examined. These discrepancies are primarily   auto.arima and cross-validation. The forecast line generated
            due to sudden surges in cases associated with the emergence   by the cross-validated model aligns more closely with the
            of new variants, highlighting the limitations of traditional   actual observed data than that of auto.arima, although both
            ARIMA models in capturing such sudden changes.
                                                               models show notable deviations from the actual trajectory.
              The rolling window cross-validation approach was   These  findings  are  consistent  with  the  results  observed
            employed to evaluate different combinations of  p  and   in the US, highlighting the potential advantages of using
            q  based on the RMSE metric. This approach, which   cross-validation for parameter selection in ARIMA models
            assesses out-of-sample performance across multiple   when dealing with highly volatile and non-stationary time
            training windows, is particularly valuable for forecasting   series data.
            nonstationary time series with evolving patterns. Table 3
            summarizes the RMSE values for the US’s ARIMA model   4.4. The effect of vaccination on new COVID-19 cases
            using parameters selected through cross-validation,   Beginning in December 2020, global vaccination efforts
            compared to those obtained using auto.arima, while   against COVID-19 raised a critical question of whether
            Figure  4A  provides a heatmap visualizing RMSE across   the vaccination campaigns effectively reduce the number
            different p and q combinations.                    of new COVID-19  cases. To address this issue, several
              As illustrated in  Figure  4A, the RMSE heatmap   statistical methods were applied, including the  Granger
            shows that the cross-validated ARIMA parameters (p=2,   causality test, segmented regression analysis, the Chow
            q = 2) achieve better performance compared to the auto.  test, and RDD.
            arima parameters (p=1, q  = 0). The heatmap provides a   The Granger causality test was performed to evaluate
            comprehensive view of how different combinations of   whether the number of vaccinated individuals could
            p and q affect forecast accuracy, with lower RMSE values   predict future new COVID-19  cases while accounting
            indicating better performance.                     for past case counts. Two models were compared: One
              Furthermore,  Figure  4B compares the forecasted   incorporating lags of both new cases and vaccination
            COVID-19  cases in the US using ARIMA models with   counts, and another including only lags of new cases.


            Volume 2 Issue 3 (2025)                        118                           doi: 10.36922/MI025040007
   121   122   123   124   125   126   127   128   129   130   131