Page 197 - IJOCTA-15-1
P. 197
Key drivers of volatility in BIST100 firms using machine learning segmentation
where |C| is the number of objects in cluster C.
k
X X 2 Minimum Average Dissimilarity to Other
SSE = ∥x j − µ i ∥ (24)
Clusters: The minor average dissimilarity b(i) of
i=1 x j ∈C i
object i to any cluster C other than A is deter-
Where k is the number of clusters, C i represents mined as:
the i-th cluster, x j is a data point in cluster C i ,
b(i) = min d(i, C) (27)
and µ i is the centroid of cluster C i . The goal is to C̸=A
minimize the SSE to achieve more homogeneous
clusters. 51 Silhouette Value Calculation: The Silhouette
The Elbow Method, which examines the rela- value s(i) for object i is then defined as:
tionship between the number of clusters and the b (i) − a (i)
s (i) = (28)
within-cluster sum of squares (inertia), showed max (a (i) , b (i))
a significant drop in inertia when moving from
one to two clusters (from 0.190 to 0.065). After The value of s(i) ranges between −1 and 1, indi-
this point, the inertia rate decreased considerably, cating how well the object i is clustered:
forming an “elbow” in the graph. This suggests
that two clusters provide a good balance between
minimizing within-cluster variance and avoiding • s(i) close to 1 indicates that the object is
overfitting. well-clustered and appropriately assigned
Table 1. Elbow method (inertia) for to its current cluster.
determining optimal clusters • s(i) around 0 indicates that the object lies
near the boundary between two clusters.
Inertia (Sum of Squared • s(i) close to −1 suggests that the object
Number of Clusters
Errors, SSE) might be misclassified and belong to a dif-
1 Cluster 0.190 ferent cluster.
2 Clusters 0.065
3 Clusters 0.025
4 Clusters 0.016
5 Clusters 0.013 By averaging the Silhouette values of all objects,
Sources: Authors’ Finding.
the overall clustering quality can be evaluated, en-
abling the identification of the most natural num-
Silhouette Score Method: The Silhouette score
ber of clusters in the dataset. This methodol-
provides a metric for assessing the quality of
ogy facilitates the interpretation and validation
a clustering result by measuring how similar of clustering results, offering a graphical and nu-
each object is to its cluster compared to other merical assessment of cluster compactness and
clusters. 52 The following steps outline the calcu- separation. 52
lation of the Silhouette score for each object in a
dataset: The higher the Silhouette Score value, the better
Average Dissimilarity within the Cluster: For the clustering algorithm performance. 53 Silhou-
an object i assigned to cluster A, the average dis- ette scores, which measure the similarity within
similarity a(i) to all other objects within the same clusters relative to the separation between clus-
cluster A is computed as: ters, were highest for the two clusters. This in-
1 X dicated a strong balance between cohesion within
a(i) = d(i, j) and separation between clusters, supporting the
|A| − 1 (25)
j∈A choice of two clusters as the most appropriate.
j̸=i
where d (i, j) denotes the dissimilarity between The Silhouette Score, which measures how simi-
objects i and j, and |A| is the number of objects lar an object is to its cluster compared to others,
in cluster A. provided strong support for a two-cluster solu-
tion. The highest Silhouette Score of 0.853 was
Average Dissimilarity to Other Clusters: For
achieved with two clusters, indicating that this
each cluster C different from A, the average dis-
configuration offers the best balance between co-
similarity d(i, C) of object i to all objects in clus-
hesion within and separation between clusters.
ter C is calculated as:
The score dropped significantly for higher num-
bers of clusters (0.526 for three clusters, 0.502 for
1 X
d(i, C) = d(i, j) (26) four, and 0.364 for five), further reinforcing the
|C|
j∈C appropriateness of the two-cluster solution.
191

