Page 195 - IJOCTA-15-1
P. 195

Key drivers of volatility in BIST100 firms using machine learning segmentation
            supported their selection, indicating their mean-
            ingful contribution to understanding the underly-                K
                                                                            X       X          ↼  2
            ing structure of the data.                              W(C) =      N k      ∥x i − x k ∥    (13)
            By reducing the data to two principal compo-                    k=1    C(i)=k
            nents, we could visualize and cluster the firms         ↼
                                                              Here, x k represents the mean vector of the k-th
            more effectively based on their volatility charac-
                                                              cluster, and N k is the number of points in that
            teristics. This step facilitated a more streamlined
                                                              cluster:
            analysis and enhanced the interpretability of the
            results, providing more targeted insights into how
            different factors influence stock market volatility.            ↼     ↼       ↼             (14)
                                                                            x k = x 1k , . . . , x pk
            The decision to use two principal components was
            based on their ability to explain a significant por-
                                                                                  N
            tion of the variance in the dataset. The first com-            N k =  X  I(C(i) = k)         (15)
            ponent alone captured nearly half of the patterns
                                                                                 i=1
            in the data (49.84%), while the second compo-
            nent added another 19.63%, totaling around 70%    The objective of the K-means algorithm is to min-
            of the variance. This cumulative variance cover-  imize the total within-cluster variance by appro-
                                                              priately assigning the N observations to K clus-
            age indicated that these two components provided
                                                              ters, such that the average dissimilarity from the
            a robust understanding of the underlying trends
                                                              cluster mean is minimized. This is mathemati-
            without introducing excessive complexity.
                                                              cally represented as:
            The application of PCA to our volatility data sets
            the stage for our subsequent clustering analysis,
                                                                              K
            allowing us to identify meaningful groups of firms       ∗       X       X         ↼   2
            with similar volatility characteristics.               C = min      N k      ∥x i − x k ∥    (16)
                                                                          C
                                                                             k=1   C(i)=k
            3.5. Cluster analysis                             To achieve this minimization, the following opti-
                                                              mization problem must be solved:
            3.5.1. K-means algorithm
                                                                             K
            Following the dimensionality reduction achieved                 X       X             2
                                                                      min
            through PCA, we employ cluster analysis to iden-       C,{m k } K   N k      ∥x i − m k ∥    (17)
            tify distinct groups of firms based on their volatil-        k=1 k=1   C(i)=k
            ity characteristics.                              Based on the Parkinson volatility scores from
            The K-means algorithm is a widely recognized      2006 to 2023, the clustering analysis of our
            method of iterative descent clustering, particu-  dataset identified two distinct groups of firms
            larly suitable for quantitative variables and uti-  within the BIST100 index. This part of the study
            lizing squared Euclidean distance as the dissimi-  used spectral clustering to explore and classify
            larity measure. The squared Euclidean distance    these firms according to their volatility charac-
            between two points x i and x i in a p-dimensional  teristics.
                                        ′
            space is defined as: 49                                − Cluster 1: Characterized by an average
                                                                     volatility of approximately 0.0528, this
                           p                                         group represents the less volatile clus-
                          X            2            2
                     ′)
              d(x i , x i =  (x ij − x i j ) = ∥x i − x i ′∥  (11)   ter.  Firms in this cluster typically ex-
                                     ′
                          j=1                                        hibit more stable price movements over
            This metric forms the foundation for the within-         time, indicating a lower sensitivity to mar-
            cluster scatter, which is pivotal to the K-means         ket dynamics and potentially lower risk
            clustering objective.  The within-point scatter,         profiles. This stability makes these firms
            W(C), for a clustering assignment C is expressed         particularly interesting for risk-averse in-
            as:                                                      vestors or studies on financial stability.
                                                                   − Cluster 2: This cluster shows a higher av-
                                                                     erage volatility of 0.0673, marking it as
                           K
                        1  X X       X             2                 the more volatile group.   Firms in this
                W(C) =                    ∥x i − x i ′∥  (12)
                        2                                            cluster are likely more sensitive to market
                                      ′
                          k=1 C(i)=k C(i )=k
                                                                     fluctuations and exhibit greater volatility.
            Alternatively, this can be rewritten in terms of         These characteristics point to higher risk
            the cluster means:                                       but potentially higher returns, appealing
                                                           189
   190   191   192   193   194   195   196   197   198   199   200