Page 213 - IJOCTA-15-4
P. 213
Data-driven optimization and parameter estimation for an epidemic model
population. The question of the reproduction much emphasis on the small values where data
number of the disease on the metric graph, R 0 , may be lacking.
is still an open problem at this time. As we will see in Section 3.1, the objective
To validate the model—i.e., to determine function exhibits Rosenbrock-like behavior, with
whether the inclusion of additional transport steep sides and a long, corrugated valley of lo-
terms is necessary—one would ideally start from cal minima. This structure indicates that the in-
a general PDE with candidate terms and use verse problem is likely ill-posed; there is not one
data to decide which terms should remain in unique set of parameters that acts as a global min-
the model. 79,80 This process may involve machine imizer. Instead, we aim to find a plausible set of
learning or other techniques beyond the scope of parameters that minimizes the 2-norm and quali-
the present work. Here, we adopt the model as de- tatively matches the data, particularly the timing
scribed above, consistent with the previous stud- and width of the infection curve. 62,63
ies. For the metric graph under study, we must de-
The system of PDEs is approximated numeri- termine or adjust the following global sensitivity
cally using the validated methodology introduced parameters:
in our previous work. 30 In brief, we use a forward
• Adjustments (c β and c η ) to the approxi-
finite difference (FD) approximation in time and
mated transmission rate and removal rate
a centered FD in space. This explicit numerical at the vertex.
scheme is easily scalable to larger networks with- • Adjustment (c λ ∈ (0, 1)) to the vertex-to-
out needing to invert a large matrix. edge exchange rate.
• Edge-to-vertex exchange rate (α ∈ (0, 1)).
2.3. Optimization-based parameter • Global scaling parameter for edge-to-edge
estimation
exchange c v .
We compare our model to the smoothed Ministry • Edge diffusion coefficient (d e ) for the net-
of Health data 64 from the first fully recorded wave work.
of COVID-19 in Poland: early February through For the first optimization step, we assume
mid-May 2021 (see Appendix A for a more thor- these six scaling parameters are constant across
ough discussion of data pre-processing). We seek the entire network. Since these global scal-
to minimize the difference between the smoothed ing parameters multiply our data-informed initial
data and the function output I v (t) at each vertex guesses (Appendix B), it is important to have a
over time. good initial guess. Once a good global set of scal-
There are well-documented cases of under- ing parameters is found, the individual parame-
reporting, 81 with one study estimating that only ters can be manually adjusted to improve the fit.
60% of COVID cases 48 in Poland were detected, A global sensitivity analysis (Appendix C)
while another study claiming it may be as low as showed that the model is highly sensitive to the
1 of all cases. 49 Some reasons for under-reporting,
4 scaling c β of the transmission rates β v . Scaling
both in Poland and worldwide, may include the the removal rates η v by c η contributes more to
presence of asymptomatic cases, 82 limited access the time and amplitude of the peak infection than
to testing, 83 and reluctance to either be tested 83 the cumulative infection rate, while the edge-to-
or seek medical care. 84 vertex transmission rate α is more influential in
Though the amplitudes of the incidence rates the number of cumulative infections. The model
are unreliable due to under-reporting, it is rea- is not very sensitive to changes in the diffusion co-
sonable to assume that the shapes of the infec- efficient d e , scaling of the edge-to-vertex exchange
tion curves are more reliable than the values, in rate by c λ , or changes to the edge-to-edge skipping
particular, the time of peak infection and the vari- parameter c v (the latter was also observed in). 30
ance. Therefore, for vertex v, we use the 2-norm Thus, we make some simplifications: we keep both
of the difference between the smoothed data and the edge diffusion coefficient and the global scal-
the model output, both first normalized by their ing of the edge-to-edge skipping parameter con-
maxima. This normalization preserves the shape stant for the entire network.
of the infection curves while under-emphasizing
the unknown amplitude. It is able to convey 2.3.1. Optimization methodology
the trends in the data without requiring precise Our optimization has the following two phases:
knowledge of the total number of infected peo- global and local. Starting with the initial guesses
ple. We discard the first and last 20 days of the described in Appendix B, we first fit a single set of
modeled period for the computation of the nor- optimization parameters (c β , c η , c λ , c v , α) for the
malized 2-norm difference, so we do not place too entire network. As our objective function, we use
755

