Page 11 - IJPS-2-2
P. 11
Danan Gu, Runlong Huang, Kirill Andreev, et al.
2.2 Methods and Analytical Strategy
Two sets of approaches were used. The first set estimated the overall underestimation of mortality at
old ages. Specifically, we investigated two pairs of associations for logit-transformed probabilities of
q ) versus dying at
dying. One pair of logit-transformed probabilities of dying at ages 70–95 ( 25 70
ages 60–70 ( q ). The other pair is the logit-transformed probabilities of dying at ages 80–95
10 60
( q ) versus dying at ages 70–80 ( q ) (see Appendix A: Note 2). Because age misreporting at
15 80
10 70
younger ages was generally less severe, we would expect the results from the first pair are more re-
liable. In other words, we investigated two conditional probabilities of dying. To assess the accuracy
of the Chinese data, we compared with those in the 13 HMD countries — particularly in Sweden and
Japan, the two countries with the most accurate mortality data in the contemporary world. The HMD
life tables were used to generate the probabilities of dying for the 13 countries. We further developed
the mathematical form of these two associations based on logit-transformed linear regression models
and the confidence ellipse with data from the 13 HMD countries. We did not apply the modified
logit methods because we are not confident about the reliability of mortality at very young ages and
the life expectancies at birth in each province from the censuses. We argue that these methodological
applications would be meaningful only when mortality at these younger ages are appropriately ad-
justed, which is beyond the scope of this research.
Two linear regression models were established for two pairs of probabilities of dying for each sex:
logit( q ˆ ) = β 0 β + 1 *logit( q ) and logit( q ˆ ) = β 0 β + 1 *logit( q ) , where q is the
ˆ
25 70
nx
10 60
15 80
10 70
probability of dying from age x to age x+n that needs to be estimated, β is the intercept, and β is the
0
1
q
2
)
slope, logit( q = 0.5*ln n x . The R for these four linear regressions were approximately
nx
1− nx q
0.86~0.93, indicating that a high proximate linear relationship existed among these probabilities. We
also applied quadratic forms; however, the improvement was too small to warrant inclusion.
Based on the difference between the observed 25 70 25 70
ˆ q derived from the
q in the Chinese data and
HMD regression model by using 10 60 10 60
q in Chi-
q from the Chinese censuses and assuming that
nese censuses was accurate, we then calculated the average possible underestimation rate in the
ˆ
q / q ) to
probability of dying for each five-year age group over ages 70–95. We used 100*(1− 5 x 5 x
estimate the average underestimation of a five-year age group over the ages 70–95, where q
5 x
represents the observed average probability of dying over a five-year age group in the entire age
ˆ
q
group 70–95 (i.e., q = − 1/5 (1− 25 70 ) ) and q represents the corresponding estimated average
1
5 x
5 x
q from the Chi-
probability of dying derived from the HMD linear regression model by using 10 60
nese censuses. This criterion is more strict; therefore, the results of this approach can be interpreted
as the highest possible underestimation of mortality for a given province.
To make our approach more reasonable, we further used the boundary of the confidence ellipse —
which included 95% of data points of 25 70 10 60
q and
q in the 13 HMD countries — to estimate
alternative underestimations of mortality. The confidence ellipse was estimated using the R package
of CAR (Fox and Weisberg, 2011). Specifically, we assumed that there was no mortality underesti-
mation if the observed data points in the Chinese censuses fell within the confidence ellipse or above
the lower boundary of the confidence ellipse. This is a very lenient criterion; therefore, the re-
sults can be interpreted as the lowest possible underestimation of mortality for a specific province.
The confidence ellipse-based approach was defined as Scenario A and the linear regression-based
approach was defined as Scenario B. Similar procedures of Scenarios A and B were applied to the
pair of 15 80 10 70
q and
q .
International Journal of Population Studies | 2016, Volume 2, Issue 2 5

