Page 227 - IJOCTA-15-4
P. 227

Data-driven optimization and parameter estimation for an epidemic model
            Appendix A. Data pre-processing                   methodology has been employed in many other
                                                              epidemic modeling studies. 53,58–61
            The daily raw data are highly oscillatory, indicat-
                                                                  To follow Schmitt’s methodology, let P be the
            ing inconsistent daily reporting - see Figure A11.
                                                              total population at a given vertex v, which we as-
            In order to get data suitable for use in parame-  sume is constant over the modeling period. We
            ter estimation, we pre-process using a Gaussian-
            weighted moving average filter to smooth the      work over a short time period on which we assume
            data.                                             S ≈ P, α = 0, and λ = 0, temporarily decoupling
                                                              all edges and vertices. On this short timeline, the
                In the SIR model, the I(t) and R(t) terms
            represent the currently infectious population and  infected population I and the removed population
            the removed population (deaths + recoveries) at a  R grow at each vertex as follows:
            given time t, respectively. These values are rarely
            directly available from data. For the Poland MOH                  d  I = (βP − η) I
            data, 64  as with most typical epidemic datasets,                 dt
            the data come in the form of daily new case                      d  R = ηI,
            counts. The MOH dataset contains both deaths                     dt
            and recoveries, but it is not obvious how under-  where P is the total population at vertex v. Thus,
            reporting impacted these numbers.                 we can approximate β and η from data as follows:
                We can approximate I(t) as follows:      let                  R(t + 1) − R(t)
            N(t) be the reported number of new cases at                             ∆t       ≈ ηI(t)
            time t. Then the rolling sum of new cases for             I(t + 1) − I(t)
            n days, where n is the average infection du-                    ∆t       ≈ (βP − η) I(t),
            ration, is an approximation for I(t).   We let    where ∆t = 1 for daily case data. Note that val-
            n = 6, accounting for the average duration of     ues of I and R are not directly available and have
            viral shedding of 5 days 48  and average time to  to be computed from the data as described in Ap-
            death of 14.8 days, 111  which is consistent with  pendix A. At each vertex, β and η are multiplied
            other studies of early COVID-19 in Poland where   by optimization parameters c β (v) and c η (v) to be
            deaths lagged behind new cases by 2-3 weeks. 112  determined through model fitting.
            The case fatality rate can be approximated from
            the data as deaths/(deaths+recoveries) ≈ 2.6%.    Vertex to edge (λ)
                                                               v
            Therefore, the average duration of infectiousness  λ is the rate at which individuals leave vertex v
                                                               e
            can be approximated using the weighted average    to travel on edge e. We use traffic data from a
            (0.026)(14)+(1−0.026)(5) ≈ 5.6, which we round    2022 Polish government report 113  to inform this
            up to 6. Later COVID-19 variants likely have dif-  parameter. Most of the edges in our network may
            ferent properties.                                be classified in terms of the main European E-
                R(t) is the cumulative removed population at  roads through Poland, as shown in Figure 1. If the
            time t. Let M be the reported removed cases       edge is part of an E-road, we associate a number
                                               P  τ=t
            (deaths + recoveries). Then R(t) ≈       M(τ).    with the traffic density on that E-road from the
                                                  τ=0
            The removed cases likely suffer from under-       report. 113  The roads that are not part of the E-
            reporting, as it is difficult to track all recoveries,  road network may be classified by type (express-
            but the scaling parameters in our optimization    way, highway, etc.), with traffic density also given
            adjust for under-reporting.                       in the report. 113  When an edge represents multi-
                                                              ple routes, the traffic densities are added together
            Appendix B. Parameter selection                   to account for multiple edges.
                                                                                           v
            Here, we discuss our initial guesses for the model    Then the initial guess for λ is the traffic den-
                                                                                           e
            parameters.   The initial guesses were then ad-   sity on edge e divided by the sum of the traffic
            justed as described in Section 2.3.               densities on all edges incident to vertex v, multi-
                                                              plied by an unknown scaling parameter for each
            Infection and removal rates (β and η)
                                                              vertex, c λ (v) ∈ (0, 1), to be determined through
            To find an initial guess for β and η at           model fitting.
            each vertex, we adapt a finite-difference-based
            methodology, 53,58–61  in particular the first-order  Edge m to edge n skipping parameter (v e m ,e n )
            approximation described in a recent paper by      We also use the traffic data 113  to estimate the
            Schmitt. 59  This methodology, is a good first guess  vertex skipping parameter. For a given edge e m ,
            before optimization, as it requires only approx-  v v   represents the rate at which traffic leaves
                                                               e m,e n
            imate exponential growth rates and has been       edge e m to travel to the adjacent edge e n . We esti-
            validated on COVID data in France.   59  Similar  mate it as the traffic density on e n divided by the
                                                           769
   222   223   224   225   226   227   228   229   230   231   232