Page 148 - GHES-3-2
P. 148
Global Health Economics and
Sustainability
Vaccine hesitancy in the US, India, and China
change. These coefficients can be exponentiated (converted LASSO or L1 constraint (Norm [β]) gives sharp corners,
1
to odds ratios [ORs]) to interpret the direction and which results in sparsity and convexity, while (Norm [β]) 2
magnitude of the relationship between the predictors and (ridge regression) does not give sparsity. This difference
the outcome categories. is related to how squared error loss interacts with the
constraint sets of LASSO and ridge regression. Sparsity is
The independent variables can be either dichotomous
(i.e., binary) or continuous (i.e., interval or ratio in scale). desirable in general, but it becomes particularly important
when the number of parameters is larger than the sample
Here P (Hesitant)=p = probability of falling in class size. Sparsity implies a lesser number of parameters in a
1
“Hesitant,” P (Unsure) = p = probability of falling in class model, focusing on the most important ones. The LASSO
2
“Unsure,” P (Not hesitant) = p = probability of falling in has the properties of both sparsity and convexity. In all-
3
class “Not hesitant,” p +p +p = 1. subset regression, whether forward or backward selection,
2
1
3
log(p /p ) = b +b X +⋯+b X p (I) coefficients that are not important or insignificant (based
1
3
1
10
1p
11
on a p-value or some other criteria) are set to zero. In
log(p /p ) = b +b X +⋯+b X (II) machine language parlance, feature selection is akin to all-
2
3
2p
p
1
20
21
subset regression in a multiple or logistic regression model.
After obtaining the estimates of the coefficients Having the property of convexity allows easy optimization
(b ,b ,⋯,b ,b ,⋯,b ), the ORs, which are obtained by in terms of useful parameters and with less computational
1p
11
2p
20
10
exponentiating the log odds, are used to determine the time. For further details, please refer to Hoerl and Kennard
significance of the independent variables when compared (1970); Bertsimas et al. (2016); and Hastie et al. (2017).
between the two groups. If OR > 1, and p < 0.05, then In penalization methods, feature selection and efficient
the participants are more likely to be in the “Hesitant” classifier construction are achieved simultaneously.
(Unsure) group than the “Not hesitant” group. The joint Among the widespread penalized techniques are LASSO,
p-value is calculated by multiplying the individual p-values elastic net, and ridge regression (Hastie et al., 2015).
from the two models in the MLR. 3.4.1.1. LASSO
It regularizes (constrains) the regression coefficients
3.2. Model 2
toward zero by penalizing the regression model with the
Binary logistic regression is a subset of MLR in which sum of coefficients as a penalty term called L1-norm. This
there are only two categories for the response variable penalty forces the coefficient estimates, with a minimum
(Equation III). Here, there are only two categories in the contribution to the model, to zero.
outcome variable: “Hesitant” and “Not hesitant.” The log-likelihood of BLR is shown below (Equation IV):
p
. b X
( −
π
log 1 b b X p p (III) ( ) = l β n = y ( ) ( − 1 y )log 1 π+ ) log =
1
1
0
1 p 1 ∑ i 1 i i i i
n [ log(y π i ) + ( − log 1 π )] (IV)
3.3. Model 3 ∑ i =1 i 1 − π i i
The dataset was divided into training and testing datasets, Where π = Pr (y = 1/x ) is given by 1 e x i .
the MLR model was applied to the training dataset, and the i i i i
responses were predicted for the test dataset. A confusion Substituting π in the loglikelihood gives (Equation V):
i
matrix was created to compute the accuracy, sensitivity, and n x i
log
specificity of the model (Machine Learning Mastery, 2020). i 1 [yx 1 e ] (V)
ii
3.4. Model 4 When using the LASSO penalty term λ, a regularizing
3.4.1. Penalized methods parameter, the likelihood is represented by Equation VI:
The penalized methods, especially the L1-penality method L n x i p
(
known as the least absolute shrinkage and selection l i1 [ y x log1 e )] j1 | j | (VI)
i
i
operator (LASSO), are useful when traditional all-subset
regression methods become computationally inefficient. It uses the L1 penalty, which uniformly penalizes all
The traditional regression methods will typically have the parameters, and due to the convexity of a function,
all the estimates of regression coefficients as non-zero. the estimates of parameters are optimum with a minimum
Certain types of constraints on the parameters provide mean square error. The fit is independent of multiplicative
both theoretical and computational advantages. The scaling.
Volume 3 Issue 2 (2025) 140 https://doi.org/10.36922/ghes.2958

