Page 148 - GHES-3-2
P. 148

Global Health Economics and
            Sustainability
                                                                             Vaccine hesitancy in the US, India, and China


            change. These coefficients can be exponentiated (converted   LASSO or L1 constraint (Norm [β]) gives sharp corners,
                                                                                            1
            to odds ratios [ORs]) to interpret the direction and   which results in sparsity and convexity, while (Norm [β]) 2
            magnitude of the relationship between the predictors and   (ridge regression) does not give sparsity. This difference
            the outcome categories.                            is related to how squared error loss interacts with the
                                                               constraint sets of LASSO and ridge regression. Sparsity is
              The independent variables can be either dichotomous
            (i.e., binary) or continuous (i.e., interval or ratio in scale).   desirable in general, but it becomes particularly important
                                                               when the number of parameters is larger than the sample
            Here P (Hesitant)=p = probability of falling in class   size. Sparsity implies a lesser number of parameters in a
                              1
            “Hesitant,” P (Unsure) = p = probability of falling in class   model, focusing on the most important ones. The LASSO
                                 2
            “Unsure,” P (Not hesitant) = p = probability of falling in   has the properties of both sparsity and convexity. In all-
                                     3
            class “Not hesitant,” p +p +p = 1.                 subset regression, whether forward or backward selection,
                                2
                             1
                                   3
            log(p /p ) = b +b X +⋯+b X p                (I)    coefficients that are not important or insignificant (based
                            1
                  3
                1
                      10
                                  1p
                          11
                                                               on  a  p-value  or  some  other  criteria)  are  set  to  zero.  In
            log(p /p ) = b +b X +⋯+b X                 (II)    machine language parlance, feature selection is akin to all-
                2
                  3
                                  2p
                                    p
                            1
                      20
                          21
                                                               subset regression in a multiple or logistic regression model.
              After obtaining the estimates of the coefficients   Having the property of convexity allows easy optimization
            (b ,b ,⋯,b ,b ,⋯,b ), the ORs, which are obtained by   in terms of useful parameters and with less computational
                     1p
                11
                            2p
                       20
             10
            exponentiating the log odds, are used to determine the   time. For further details, please refer to Hoerl and Kennard
            significance of the independent variables when compared   (1970); Bertsimas et al. (2016); and Hastie et al. (2017).
            between the two groups.  If OR  >  1,  and  p  <  0.05, then   In penalization methods, feature selection and efficient
            the participants are more likely to be in the “Hesitant”   classifier construction are achieved simultaneously.
            (Unsure) group than the “Not hesitant” group. The joint   Among the widespread penalized techniques are LASSO,
            p-value is calculated by multiplying the individual p-values   elastic net, and ridge regression (Hastie et al., 2015).
            from the two models in the MLR.                    3.4.1.1. LASSO
                                                               It regularizes (constrains) the regression coefficients
            3.2. Model 2
                                                               toward zero by penalizing the regression model with the
            Binary logistic regression is a subset of MLR in which   sum of coefficients as a penalty term called L1-norm. This
            there are only two categories for the response variable   penalty forces the coefficient estimates, with a minimum
            (Equation III). Here, there are only two categories in the   contribution to the model, to zero.
            outcome variable: “Hesitant” and “Not hesitant.”     The log-likelihood of BLR is shown below (Equation IV):
                 p
                                  . b X
                                                                                            ( −
                                                                               π
            log   1    b  b X    p  p           (III)    ( ) = l  β  n =     y  ( ) ( − 1 y  )log 1  π+  )  log    =
                              1
                            1
                        0
                1 p 1                                             ∑ i  1  i  i       i       i
                                                                 n  [ log(y  π i  ) +  ( − log 1  π )]    (IV)
            3.3. Model 3                                       ∑ i =1  i  1 − π i      i
            The dataset was divided into training and testing datasets,   Where  π  = Pr (y = 1/x ) is given by   1 e  x i  .
            the MLR model was applied to the training dataset, and the   i       i    i             i
            responses were predicted for the test dataset. A confusion   Substituting π in the loglikelihood gives (Equation V):
                                                                          i
            matrix was created to compute the accuracy, sensitivity, and   n          x i
                                                                              log
            specificity of the model (Machine Learning Mastery, 2020).       i 1 [yx   1  e  ]    (V)
                                                                        ii
            3.4. Model 4                                         When using the LASSO penalty term λ, a regularizing
            3.4.1. Penalized methods                           parameter, the likelihood is represented by Equation VI:
            The penalized methods, especially the L1-penality method   L  n            x i     p
                                                                                   (
            known as the least absolute shrinkage and selection   l     i1 [ y x   log1  e  )]     j1   |  j  |  (VI)
                                                                           i
                                                                             i

            operator (LASSO), are useful when traditional all-subset
            regression methods become computationally inefficient.   It uses the L1 penalty, which uniformly penalizes all
            The traditional regression methods will typically have   the parameters, and due to the convexity of a function,
            all the estimates of regression coefficients as non-zero.   the estimates of parameters are optimum with a minimum
            Certain types of constraints on the parameters provide   mean square error. The fit is independent of multiplicative
            both theoretical and computational advantages. The   scaling.
            Volume 3 Issue 2 (2025)                        140                       https://doi.org/10.36922/ghes.2958
   143   144   145   146   147   148   149   150   151   152   153