Page 11 - IJPS-2-2
P. 11

Danan Gu, Runlong Huang, Kirill Andreev,  et al.

                             2.2 Methods and Analytical Strategy

                             Two sets of approaches were used. The first set estimated the overall underestimation of mortality at
                             old ages. Specifically, we investigated two pairs of associations for logit-transformed probabilities of
                                                                                                 q ) versus dying at
                             dying. One pair of logit-transformed probabilities of dying at ages 70–95 (  25 70
                             ages 60–70 ( q ). The  other pair  is the  logit-transformed  probabilities  of dying at ages  80–95
                                         10 60
                             ( q ) versus dying at ages 70–80 ( q ) (see Appendix A: Note 2). Because age misreporting at
                              15 80
                                                             10 70
                             younger ages was generally less severe, we would expect the results from the first pair are more re-
                             liable. In other words, we investigated two conditional probabilities of dying. To assess the accuracy
                             of the Chinese data, we compared with those in the 13 HMD countries — particularly in Sweden and
                             Japan, the two countries with the most accurate mortality data in the contemporary world. The HMD
                             life tables were used to generate the probabilities of dying for the 13 countries. We further developed
                             the mathematical form of these two associations based on logit-transformed linear regression models
                             and the confidence ellipse with data from the 13 HMD countries. We did not apply the modified
                             logit methods because we are not confident about the reliability of mortality at very young ages and
                             the life expectancies at birth in each province from the censuses. We argue that these methodological
                             applications would be meaningful only when mortality at these younger ages are appropriately ad-
                             justed, which is beyond the scope of this research.
                                Two linear regression models were established for two pairs of probabilities of dying for each sex:
                              logit( q ˆ ) = β  0  β +  1 *logit( q  )   and  logit( q ˆ ) = β  0  β +  1 *logit( q  ) , where q is the
                                                                                                           ˆ
                                   25 70
                                                                                                         nx
                                                       10 60
                                                                        15 80
                                                                                            10 70
                             probability of dying from age x to age x+n that needs to be estimated, β is the intercept, and β is the
                                                                                          0
                                                                                                             1
                                                       q   
                                                                    2
                                            )
                             slope,  logit( q = 0.5*ln   n  x    . The R  for these four linear regressions were approximately
                                         nx
                                                      1−  nx   q
                             0.86~0.93, indicating that a high proximate linear relationship existed among these probabilities. We
                             also applied quadratic forms; however, the improvement was too small to warrant inclusion.
                                Based on the difference between the observed 25 70              25 70
                                                                                                  ˆ q derived from the
                                                                         q in the Chinese data and
                             HMD regression model by using   10 60                                      10 60
                                                                                                         q in Chi-
                                                             q from the Chinese censuses and assuming that
                             nese censuses was accurate, we  then calculated  the  average  possible  underestimation rate in the
                                                                                                             ˆ
                                                                                                         q / q ) to
                             probability of dying for each five-year age group over ages 70–95. We used 100*(1−  5 x 5 x
                             estimate the average underestimation of a five-year age group over the ages 70–95, where  q
                                                                                                               5 x
                             represents the observed average probability of dying over a five-year age group in the entire age
                                                                        ˆ
                                                              q
                             group 70–95 (i.e.,  q = − 1/5 (1−  25 70 ) ) and q represents the corresponding estimated average
                                                   1
                                                                      5 x
                                              5 x
                                                                                                    q from the Chi-
                             probability of dying derived from the HMD linear regression model by using   10 60
                             nese censuses. This criterion is more strict; therefore, the results of this approach can be interpreted
                             as the highest possible underestimation of mortality for a given province.
                                To make our approach more reasonable, we further used the boundary of the confidence ellipse —
                             which included 95% of data points of   25 70  10 60
                                                                 q   and
                                                                           q   in the 13 HMD countries — to estimate
                             alternative underestimations of mortality. The confidence ellipse was estimated using the R package
                             of CAR (Fox and Weisberg, 2011). Specifically, we assumed that there was no mortality underesti-
                             mation if the observed data points in the Chinese censuses fell within the confidence ellipse or above
                             the  lower boundary of the confidence ellipse. This is a very lenient  criterion;  therefore, the re-
                             sults can be interpreted as the lowest possible underestimation of mortality for a specific province.
                             The confidence ellipse-based approach was defined as Scenario A and the linear regression-based
                             approach was defined as Scenario B. Similar procedures of Scenarios A and B were applied to the
                             pair of   15 80  10 70
                                      q   and
                                                q .
                                     International Journal of Population Studies | 2016, Volume 2, Issue 2       5
   6   7   8   9   10   11   12   13   14   15   16