Page 12 - IJPS-2-1
P. 12

Nonparametric graduation techniques as a common framework for the description of demographic patterns

       of f in the RKHS. The loss function L measures the estimation error of the method and ║f║ K is a
       measure for non-smoothness. The smaller  ║f║ K  is, the smoother  f becomes. This means that the
                 *
       function  f ∈ H   obtained as the solution of this optimization problem will be the result of a com-
                     K
       promise between accuracy and smoothness. As a consequence, this way to proceed seems to be a
       nice approximation for the graduation of demographic data. Moreover, the optimization problem to
       solve is convex and therefore, without local minima. This convex property is one of the main differ-
       ences with other methods, avoiding the possible existence of local solutions.
         Another key issue of SVM is its ability to map the data into a higher-dimensional space (known as
       “feature space”). To achieve this task, a kernel approach is used in order to operate in the feature
                                                 )
                                                                           n
       space. A kernel K is a real-valued function  K ( ,x y ∈ℜ where usually  ,x y  ∈ℜ , which makes the

       role of a scalar product in the feature space. In this way, the explicit coordinates in this higher-dim-
       ensional space are never calculated, as only the inner products between the images of all pairs of
       data in the feature space are needed. Three of the most widely used kernels: the linear kernel K(x,y) =
                                                                                   d
                                                                                 T
        T
       x y which corresponds to the identity mapping; the polynomial kernel K(x,y) = (c + x y) , where c
       and d are constants, which maps the data into a finitely dimensional space; and the Gaussian kernel
                 x-y 2
                −
       Kx,  )     σ  , where σ is a positive constant, which maps the data into an infinitely dimensional
         ( y = e
       space. The Gaussian kernel, given its  approximation capacity, is the  most extensively used (Mo-
       guerza and Muñoz, 2006), and the one that we suggest for graduation purposes.
         In practical implementations of the method, such as the one provided by the software R, the accu-
       racy and smoothing properties are achieved by fixing a band determined by a constant ε > 0 around
                   *
       the solution  f ∈ H . In order to penalize strong violations of the band, another constant C > 0 is
                        K
       used. The constant ε makes the role of the loss function and C performs the control of smoothness.
       As a consequence,  three  parameters are to  be fixed  when using SVM with  the Gaussian kernel,
       namely: ε, σ, and C. In practice, a grid of parameters can be determined visually taking into account
       that the problem at hand is one-dimensional. Then, a so called cross-validation is performed, that is,
       a random search within the grid is done in order to find the best combination of the parameters.

       5. Evaluation and Comparisons

       5.1 Numerical Results for Mortality
       In our calculations we used the empirical age-specific mortality rates of the male and female popula-
       tions in Sweden, for the time periods of 1981–1985, 1984–1988, and 1991–1995, as well as those in
       France and Japan for the years of 1990, 1991, and 1995. The Swedish data sets were taken from Sta-
       tistics Sweden while the French and Japanese ones were parts of the Berkeley mortality database
       (2005) available from the web.
         For kernel applications, the subroutine “lokerns” of the library “lokern” for the R-package is used
       for the calculation of Gasser-Muller estimators with local bandwidth parameter. This  is available
       from  http://cran.r-project.org/web/packages/lokern/index.html. In order to select  bandwidth for a
       local linear Gaussian kernel regression estimator, a direct plug-in technique (Ruppert, Sheather, and
       Wand et al., 1995) is used. The initial bandwidth parameter is derived using the KernSmooth library
       in R package. In particular, for this implementation we obtained an initial bandwidth h = 2.3849.
         The parameters in the Heligman-Pollard model are estimated using an iterative routine of the Nag
       library  that  is based upon  a modification  of the Gauss-Newton algorithm, described  by  Gill and
       Murray (1978). The model was fitted using weighted non-linear least squares, minimizing the fol-
       lowing sum of squares:

       6                  International Journal of Population Studies | 2016, Volume 2, Issue 1
   7   8   9   10   11   12   13   14   15   16   17