Page 197 - IJOCTA-15-1
P. 197

Key drivers of volatility in BIST100 firms using machine learning segmentation
                                                              where |C| is the number of objects in cluster C.
                               k
                              X X               2                  Minimum Average Dissimilarity to Other
                       SSE =           ∥x j − µ i ∥    (24)
                                                              Clusters: The minor average dissimilarity b(i) of
                              i=1 x j ∈C i
                                                              object i to any cluster C other than A is deter-
            Where k is the number of clusters, C i represents  mined as:
            the i-th cluster, x j is a data point in cluster C i ,
                                                                             b(i) = min d(i, C)          (27)
            and µ i is the centroid of cluster C i . The goal is to                C̸=A
            minimize the SSE to achieve more homogeneous
            clusters. 51                                          Silhouette Value Calculation: The Silhouette
            The Elbow Method, which examines the rela-        value s(i) for object i is then defined as:
            tionship between the number of clusters and the                         b (i) − a (i)
                                                                          s (i) =                        (28)
            within-cluster sum of squares (inertia), showed                      max (a (i) , b (i))
            a significant drop in inertia when moving from
            one to two clusters (from 0.190 to 0.065). After  The value of s(i) ranges between −1 and 1, indi-
            this point, the inertia rate decreased considerably,  cating how well the object i is clustered:
            forming an “elbow” in the graph. This suggests
            that two clusters provide a good balance between
            minimizing within-cluster variance and avoiding        • s(i) close to 1 indicates that the object is
            overfitting.                                             well-clustered and appropriately assigned
                    Table 1. Elbow method (inertia) for              to its current cluster.
                    determining optimal clusters                   • s(i) around 0 indicates that the object lies
                                                                     near the boundary between two clusters.
                                     Inertia (Sum of Squared       • s(i) close to −1 suggests that the object
              Number of Clusters
                                     Errors, SSE)                    might be misclassified and belong to a dif-
              1 Cluster              0.190                           ferent cluster.
              2 Clusters             0.065
              3 Clusters             0.025
              4 Clusters             0.016
              5 Clusters             0.013                    By averaging the Silhouette values of all objects,
              Sources: Authors’ Finding.
                                                              the overall clustering quality can be evaluated, en-
                                                              abling the identification of the most natural num-
            Silhouette Score Method: The Silhouette score
                                                              ber of clusters in the dataset. This methodol-
            provides a metric for assessing the quality of
                                                              ogy facilitates the interpretation and validation
            a clustering result by measuring how similar      of clustering results, offering a graphical and nu-
            each object is to its cluster compared to other   merical assessment of cluster compactness and
            clusters. 52  The following steps outline the calcu-  separation. 52
            lation of the Silhouette score for each object in a
            dataset:                                          The higher the Silhouette Score value, the better
                Average Dissimilarity within the Cluster: For  the clustering algorithm performance. 53  Silhou-
            an object i assigned to cluster A, the average dis-  ette scores, which measure the similarity within
            similarity a(i) to all other objects within the same  clusters relative to the separation between clus-
            cluster A is computed as:                         ters, were highest for the two clusters. This in-
                                  1   X                       dicated a strong balance between cohesion within
                        a(i) =            d(i, j)             and separation between clusters, supporting the
                               |A| − 1                 (25)
                                      j∈A                     choice of two clusters as the most appropriate.
                                      j̸=i
            where d (i, j) denotes the dissimilarity between  The Silhouette Score, which measures how simi-
            objects i and j, and |A| is the number of objects  lar an object is to its cluster compared to others,
            in cluster A.                                     provided strong support for a two-cluster solu-
                                                              tion. The highest Silhouette Score of 0.853 was
                Average Dissimilarity to Other Clusters: For
                                                              achieved with two clusters, indicating that this
            each cluster C different from A, the average dis-
                                                              configuration offers the best balance between co-
            similarity d(i, C) of object i to all objects in clus-
                                                              hesion within and separation between clusters.
            ter C is calculated as:
                                                              The score dropped significantly for higher num-
                                                              bers of clusters (0.526 for three clusters, 0.502 for
                                   1  X
                         d(i, C) =       d(i, j)       (26)   four, and 0.364 for five), further reinforcing the
                                  |C|
                                      j∈C                     appropriateness of the two-cluster solution.
                                                           191
   192   193   194   195   196   197   198   199   200   201   202