Page 30 - AIH-1-4
P. 30
Artificial Intelligence in Health Optimized clustering in medical app detection
the available apps, which are larger in number. This the centroids based on the mean of the data points
is because it could greatly affect the accuracy of the assigned to each cluster. The computational complexity of
prediction. The detection of zero-day apps is given the a single iteration can be broken down into two main steps:
least priority, mainly because of this reason. Hence, the assignment and update. In the assignment step, each data
novel zero-day attacks are not considered in the case of point is assigned to the nearest centroid. The computational
clustering, and they are forced to get clustered in any of complexity of this step is O (n × k × d), as for each data
the known clusters depending on their similarity with point, we calculate its distance to each of the k centroids
any of the available clusters. However, in this work, we in d-dimensional space. In the update step, the centroids
include an additional cluster to group all these zero-day are updated by calculating the mean of the data points
attacks that do not belong to any of the existing clusters. assigned to each cluster. The computational complexity of
(iv) Dynamic classifiers versus thresholding: The detection this step is O (n × d × k), and for each centroid, we calculate
systems that are normally used are the ones that the mean of the d-dimensional data points assigned to that
classify using thresholding. Even though simple, this cluster.
has the disadvantage that the detection systems fail Therefore, the overall computational complexity of
to be adaptive with a fixed threshold. A successful K-means clustering is often given as O (I × n × k × d), where
25
detection system needs to be contextual, adapting to I is the number of iterations required for convergence.
changing conditions. To overcome the generality of Typically, the number of iterations is relatively small, and
the static thresholding classifiers, dynamic classifiers, the algorithm converges quickly, especially if the data
which are based on ANN, are proposed in this is well-clustered. It is only for large datasets or a large
work. Dynamic classifiers, or leveraging ANNs, number of clusters that the computational complexity can
are advocated for their adaptability to changing become significant, which is not the case in this work. In
conditions, especially with a multitude of apps addition, the initialization of centroids can also affect the
(v) Perceptron feed-forward neural network in K-means computational complexity and convergence speed of the
clustering: The focus of this research is to determine algorithm.
the coordinates of the centroid of every cluster in the
K-means clustering process and to analyze its effect 4.3. System model
on class imbalance. In determining the centroid in In this work, a hybrid classifier with an ANN, which
K-means clustering, we propose the use of a perceptron is a universal classifier, combined with the K-means
feed-forward neural network. As a supervised learning clustering method is proposed to accurately detect the
algorithm, this method is well-known for the efficient apps. The proposed classifier exploits the statistical
handling of large amounts of data. The approach has variation of the distinguishing features of the different
been proposed to minimize the mean square error. apps. The model is updated frequently by training with
The theoretical framework underscores the need for the zero-day apps belonging to that cluster. Hence,
adaptive, context-aware detection systems, addressing issues an updated model building happened in this case,
related to known and unknown app classifications, and which reduced the overall error of the app detection.
introducing innovative solutions, such as dynamic classifiers The methodology consisted of two stages: the data
and enhanced K-means clustering with neural networks. preparation stage and the clustering stage. In the
preparation stage, the data set containing 30 medical
4.2. Performance matrices apps was trained by an ANN to generate k initial cluster
The evaluation utilized two similarity measures: the centers (centroids) for the classes that are benign,
Euclidean distance and the Manhattan distance. These malicious, and zero-day. In this research, the generated
measures were selected to assess their influence on the k centroids were used as the initial cluster centers for the
clustering outcomes across multiple iterations, the within- K-means clustering. The architecture of the training and
cluster sum of squared errors, and the overall model- test phases of the proposed method with ANN followed
building time. The computational complexity of K-means by K-means clustering is presented in Figures 2A and B.
clustering is determined by three primary factors: the The modification of the K-means clustering algorithm
number of data points (n), the number of clusters (k), and in determining centroid using ANN is given as a flowchart
the dimensionality of the data (d). in Figure 3. The algorithm for calculating the initial cluster
In each iteration of K-means, the algorithm assigns centers (centroid) of n objects using ANN is given in
each data point to its nearest centroid and then updates algorithm 1.
Volume 1 Issue 4 (2024) 24 doi: 10.36922/aih.2585

