Page 16 - IJPS-2-1

P. 16

Nonparametric graduation techniques as a common framework for the description of demographic patterns

and Peristera, 2007), and the quadratic Spline model (Schmertmann, 2003) are provided, while in
the cases of distorted fertility distributions, the Hadwiger mixture model (Chandola, Coleman, and
Hiorns, 1999; 2002) and the P-K mixture model (Kostaki and Peristera, 2007) are provided.
In order to avoid heterogeneity, we also used data differentiated by order of birth from both cohort
and period data sets. Finally, in the case of the USA, the fits of the alternative models are provided
for the white and the black population separately. Details for fitting the alternative parametric mod-
els are given by Kostaki and Peristera (2007).
The parameters of the various models have been estimated by means of a non-weighted non-linear
least-squares procedure, minimizing the following sum of squares:
ˆ
∑ ( x f x ) 2 , (4.2)
f −
x
ˆ
where f is the estimated marriage rate at age x and f x is the corresponding empirical one. This
x
minimizing criterion has been used as most appropriate for fertility graduation by Kostaki and Peris-
tera (2007) and also suggested by Hoem et al. (1981) as providing equal good fits as the more com-
plicated weighted one, with weights reciprocal to the estimated variances of the age-specific rates,
the latter being most appropriate when fitting mortality rates.
For kernel applications, in the case of mortality data, the subroutine “lokerns” of the library
“lokern” for the R-package was used for the calculation of Gasser-Muller estimators with lo-
cal bandwidth parameter. In a similar way, the initial bandwidth parameter was derived using the
KernSmooth library in R package. An initial bandwidth of h =1.9066 was obtained particularly for
this implementation.
As in the case of mortality data, for the SVM techniques, the subroutine svm of the library e1071
for the R-package is used, and a similar two-step cross-validation technique is used to select the pa-
rameters ε, σ, and C of the ε-regression procedure. Parameters ε, σ, and C play the same role as ex-
plained in the mortality study. In particular, the values ε = 0.0001, σ = 40 and C = 1.8, have been
obtained for this SVM implementation.
The values of (4.2) for all the data sets used, and all graduation techniques applied, are presented
in Tables 2 and 3. The results of fitting the parametric models were first presented by Kostaki and
Peristera (2007). Figures 7–12 provide illustrations for some chosen cases. In all cases, we used ages
ranging from 15 to 48, so each schedule has 34 rates.
As stated in the tables and figures, the results of SVM prove superior to the corresponding ones of
all the other models. SVM produced results that in the vast majority of cases are closer to the em-
pirical rates, with a sole exception, the results for the USA data differentiated by order of birth and
race, where the performance of the P-K mixture model were somewhat superior. Regarding the fig-
ures, one can easily observe that the results of SVM were closer to the empirical values especially
for the ages in the tails and the peak of the fertility curve.

Table 2. Values of (4.2) multiplied by 100.000, at the exit of the estimation procedure for P-K model, Beta model, Gamma model, Hadwiger model,
quadratic Spline model, kernels, and SVM
SSE*10 6 P-K Model BetaModel Gamma Model Hadwiger Model Quadratic Spline Model Kernel SVM
Period Data
Sweden
1996 115 108 132 326 174 67 72
2000 117 181 321 689 174 30 11
Norway
1992 242 175 265 656 263 65 61
2000 233 225 640 329 287 40 10
Denmark
1992 103 107 130 383 169 54 20

10 International Journal of Population Studies | 2016, Volume 2, Issue 1

11 12 13 14 15 16 17 18 19 20 21