Page 11 - IJPS-2-1
P. 11

Anastasia Kostaki, Javier M.  Moguerza, Alberto Olivares  and Stelios Psarakis

                             where ε i are independent random variables with zero mean and constant variance. In order to esti-
                             mate the unknown function m at a point x, an averaging of the values of the response variable is lo-
                             cally done. The smoothness of the resulting estimator is controlled by a bandwidth determining the
                             width of the neighbourhood over which the averaging is performed. As a result, the estimator of the
                             function m takes the form:
                                                           ˆ m h ( ) x =  n − 1 ∑ W h  ( ; xX X 2 , , X Y ,
                                                                               ,
                                                                                       n
                                                                              1
                                                                                         ) i
                               where W h is a weight function depending on the bandwidth parameter h and variables X 1, X 2, , X n.
                                                                                                             …
                             The shape of the weight function W h is represented by a so-called kernel function, which includes
                             the bandwidth h that adjusts the size and the form of the weights around x, acting as a scale parame-
                             ter. Hence, kernel regression estimators correspond to local weighted averages of the response vari-
                             able, with weights determined by the kernel function K, depending on the size of the weights on
                             the bandwidth parameter. Usually, for regression purposes, K performs and has the properties of a
                             probability density function: it is generally a positive, smooth function, decreasing monotonically as
                             the bandwidth parameter increases in size and peaking at zero.
                                A detailed review of the formulae proposed in the literature for the kernel estimator  ˆ m of the re-
                             gression mean function m can be consulted in Peristera and Kostaki (2005), where it is shown that
                             the Gasser-Müller estimator (Gasser and Müller, 1979,  1984) is an adequate  estimator for the
                             graduation of mortality data, its formula being:
                                                                   n    (  (i x + +  ( ) )/2
                                                                           1) x
                                                                                          ) ,
                                                         ˆ m GM  ( ) x = ∑ Y [ ] i ∫  i  K h (x − x dx
                                                                                         i
                                                                             i x −
                                                                  i= 1  (  ( )i x +  ( 1) )/2
                                                                  th
                             where x 0 = –∞, x n =  ∞, x i denotes the i  largest value of the observed covariate values and Y [i]
                             the corresponding response value.
                                Regarding the selection of the bandwidth parameter, a description of techniques can be consulted
                             in Hardle (1990, 1991), and Peristera and Kostaki (2005). A typical way to select the bandwidth pa-
                             rameter is to build a direct plug-in estimator of the optimal smoothing parameter h. Gasser et al.
                             (1991) described how unknown quantities can be effectively estimated and explicit expressions for h
                             appropriate to the Gasser-Müller estimator are provided. The selection of a global or a local band-
                             width is another crucial decision. A local selection allows the use of a smaller bandwidth in areas of
                             high density, while for areas of low density a larger bandwidth can be adopted (Brockmann et al.,
                             1993; and Hermann, 1997, for discussions on the advantages of using kernel regression estimators
                             with a local bandwidth). The underlying idea of the plug-in method is to select the optimal band-
                             widths by  estimating  the  asymptotically optimal  mean integrated squared  error bandwidths.
                             Hermann (1997) developed a generalization of the global iterative plug-in algorithm of Gasser et al.
                             (1991) for  the selection  of a  local bandwidth,  and the advantages  of the  local selection over  the
                             global plug-in rule and the cross-validation method are shown.
                             4. Support Vector Machines

                             The SVM  technique is part of the regularisation methods (Moguerza and Muñoz, 2006). These
                             methods also include Splines. In fact, there is a close relation between both methodologies — SVM
                             and Splines (Pearce and Wand, 2006). Next, we provide a brief description of the regression version of
                             SVM and its main features. SVM can be presented from its geometrical interpretation. Basically, the
                             method works by solving an optimization problem of the form (Tikhonov and Arsenin, 1977):
                                                                1  p                     2
                                                                      (
                                                          min     ∑  Lf x  i   y i ) +  M f  ,
                                                                         ( ) −
                                                           ∈
                                                           fH K p  i= 1                  K
                                                                              n
                             where (x i, y i), i = 1, K, and p are a set of data with  x  ∈ℜ   and  y i  ∈ℜ , L   is a loss function, M >
                                                                          i
                             0 is a constant that penalizes non-smoothness, H K is a space of functions known as Reproducing
                             Kernel Hilbert Space (RKHS) (Aronszajn, 1950; Moguerza and Muñoz, 2006), and ║f║ K is the norm
                                     International Journal of Population Studies | 2016, Volume 2, Issue 1       5
   6   7   8   9   10   11   12   13   14   15   16