Page 195 - IJOCTA-15-1
P. 195
Key drivers of volatility in BIST100 firms using machine learning segmentation
supported their selection, indicating their mean-
ingful contribution to understanding the underly- K
X X ↼ 2
ing structure of the data. W(C) = N k ∥x i − x k ∥ (13)
By reducing the data to two principal compo- k=1 C(i)=k
nents, we could visualize and cluster the firms ↼
Here, x k represents the mean vector of the k-th
more effectively based on their volatility charac-
cluster, and N k is the number of points in that
teristics. This step facilitated a more streamlined
cluster:
analysis and enhanced the interpretability of the
results, providing more targeted insights into how
different factors influence stock market volatility. ↼ ↼ ↼ (14)
x k = x 1k , . . . , x pk
The decision to use two principal components was
based on their ability to explain a significant por-
N
tion of the variance in the dataset. The first com- N k = X I(C(i) = k) (15)
ponent alone captured nearly half of the patterns
i=1
in the data (49.84%), while the second compo-
nent added another 19.63%, totaling around 70% The objective of the K-means algorithm is to min-
of the variance. This cumulative variance cover- imize the total within-cluster variance by appro-
priately assigning the N observations to K clus-
age indicated that these two components provided
ters, such that the average dissimilarity from the
a robust understanding of the underlying trends
cluster mean is minimized. This is mathemati-
without introducing excessive complexity.
cally represented as:
The application of PCA to our volatility data sets
the stage for our subsequent clustering analysis,
K
allowing us to identify meaningful groups of firms ∗ X X ↼ 2
with similar volatility characteristics. C = min N k ∥x i − x k ∥ (16)
C
k=1 C(i)=k
3.5. Cluster analysis To achieve this minimization, the following opti-
mization problem must be solved:
3.5.1. K-means algorithm
K
Following the dimensionality reduction achieved X X 2
min
through PCA, we employ cluster analysis to iden- C,{m k } K N k ∥x i − m k ∥ (17)
tify distinct groups of firms based on their volatil- k=1 k=1 C(i)=k
ity characteristics. Based on the Parkinson volatility scores from
The K-means algorithm is a widely recognized 2006 to 2023, the clustering analysis of our
method of iterative descent clustering, particu- dataset identified two distinct groups of firms
larly suitable for quantitative variables and uti- within the BIST100 index. This part of the study
lizing squared Euclidean distance as the dissimi- used spectral clustering to explore and classify
larity measure. The squared Euclidean distance these firms according to their volatility charac-
between two points x i and x i in a p-dimensional teristics.
′
space is defined as: 49 − Cluster 1: Characterized by an average
volatility of approximately 0.0528, this
p group represents the less volatile clus-
X 2 2
′)
d(x i , x i = (x ij − x i j ) = ∥x i − x i ′∥ (11) ter. Firms in this cluster typically ex-
′
j=1 hibit more stable price movements over
This metric forms the foundation for the within- time, indicating a lower sensitivity to mar-
cluster scatter, which is pivotal to the K-means ket dynamics and potentially lower risk
clustering objective. The within-point scatter, profiles. This stability makes these firms
W(C), for a clustering assignment C is expressed particularly interesting for risk-averse in-
as: vestors or studies on financial stability.
− Cluster 2: This cluster shows a higher av-
erage volatility of 0.0673, marking it as
K
1 X X X 2 the more volatile group. Firms in this
W(C) = ∥x i − x i ′∥ (12)
2 cluster are likely more sensitive to market
′
k=1 C(i)=k C(i )=k
fluctuations and exhibit greater volatility.
Alternatively, this can be rewritten in terms of These characteristics point to higher risk
the cluster means: but potentially higher returns, appealing
189

