Page 198 - IJOCTA-15-1
P. 198

H.H. Yildirim, A. Akusta / IJOCTA, Vol.15, No.1, pp.183-201 (2025)
                    Table 2. Silhouette scores for differ-    The centroid A i of cluster i is calculated as the
                    ent numbers of clusters                   mean of all points within the cluster:

              Number of Clusters         Silhouette Score
                                                                                       T i
              2 Clusters                 0.853                                A i =  1  X                (32)
              3 Clusters                 0.526                                      T i   x k
                                                                                      k=1
              4 Clusters                 0.502
              5 Clusters                 0.364                This centroid serves as the representative point of
              Sources: Authors’ Finding.                      the cluster around which the dispersion is mea-
                                                              sured.
            Davies-Bouldin Index: In addition to the El-      The distance M ij between the centroids of clus-
            bow Method and Silhouette Score, we utilize the   ters i and j is determined using the Minkowski
            Davies-Bouldin Index as a third measure to vali-  metric:
            date our cluster selection. This index provides an-
            other perspective on cluster separation and com-                                   1
                                                                                  P              p
            pactness.                                                            X            p
                                                                         M ij =    |a ip − a jp |      (33)
            The Davies-Bouldin Index (DBI) is a metric used
                                                                                 p=1
            to evaluate the quality of clustering algorithms.
            It is designed to measure the average similarity  Here, a ip and a jp are the p-th components of the
            ratio of each cluster with its most similar cluster.  centroids of clusters i and j, respectively, and p
            The goal is to achieve minimal similarity between  is an integer defining the type of distance (e.g.,
            clusters, thus encouraging well-separated clusters.  p = 2 for Euclidean distance).
            The cluster similarity measure, denoted as R ij , is  The lower the DBI value, the better the cluster-
            formulated as follows: 54                         ing algorithm performance. 55  The Davies-Bouldin
                                                              index, which assesses the average ratio of within-
                                                              cluster scatter to between-cluster separation, was
                                   S i + S j
                             R ij =                    (29)   significantly lower for two clusters compared to
                                     M ij
                                                              more clusters, highlighting the superior quality of
            S i and S j are the dispersions of clusters i and  clustering at this level.
            j, respectively, and M ij represents the distance  Finally, the Davies-Bouldin Index, which evalu-
            between the centroids of clusters i and j. This   ates the average similarity ratio of each cluster
            measure is designed to be non-negative and sym-   with its most similar cluster, also pointed towards
            metric, indicating the similarity between any two  two clusters as the optimal choice. The lowest
            clusters.                                         Davies-Bouldin Index value of 0.092 was observed
            The Davies-Bouldin Index R is defined as the av-  for two clusters, indicating the best cluster sep-
            erage of the maximum similarity measure for each  aration and compactness. The index increased
            cluster i with any other cluster j:               substantially for higher numbers of clusters (0.488
                                                              for three, 0.529 for four, and 0.690 for five), sug-
                                   N                          gesting that additional clusters would lead to less
                                1  X
                          R =        max R ij          (30)   distinct and more overlapping groupings.
                               N      j̸=i
                                  i=1                                Table 3. Davies-Bouldin index for
            Where N is the total number of clusters, the index       assessing clustering performance
            aims to minimize this average similarity, promot-
                                                                Number of Clusters      Davies-Bouldin Index
            ing well-separated clusters.
                                                                2 Clusters              0.092
            The dispersion measure S i quantifies the spread
                                                                3 Clusters              0.488
            of data points within a cluster and is given by:    4 Clusters              0.529
                                                                5 Clusters              0.690
                                              ! 1               Sources: Authors’ Finding.

                                 T i            q
                              1  X           q
                       S i =        ∥x k − A i ∥       (31)   The convergence of these three methods on a two-
                             T i
                                k=1                           cluster solution proves this is the most appropri-
            In this formula, T i is the number of points in clus-  ate number of clusters for our dataset. This choice
            ter i, x k represents the data points in cluster i, A i  optimizes statistical measures of cluster quality
            is the centroid of cluster i, and q is an integer that  and offers a transparent and interpretable seg-
            defines the type of distance used (e.g., q = 2 for  mentation of the firms based on their volatility
            Euclidean distance).                              characteristics.
                                                           192
   193   194   195   196   197   198   199   200   201   202   203