Page 30 - AIH-1-4
P. 30

Artificial Intelligence in Health                                Optimized clustering in medical app detection



               the available apps, which are larger in number. This   the centroids based on the mean of the data points
               is  because  it  could  greatly  affect  the  accuracy  of  the   assigned to each cluster. The computational complexity of
               prediction. The detection of zero-day apps is given the   a single iteration can be broken down into two main steps:
               least priority, mainly because of this reason. Hence, the   assignment and update. In the assignment step, each data
               novel zero-day attacks are not considered in the case of   point is assigned to the nearest centroid. The computational
               clustering, and they are forced to get clustered in any of   complexity of this step is O (n × k × d), as for each data
               the known clusters depending on their similarity with   point, we calculate its distance to each of the k centroids
               any of the available clusters. However, in this work, we   in d-dimensional space. In the update step, the centroids
               include an additional cluster to group all these zero-day   are updated by calculating the mean of the data points
               attacks that do not belong to any of the existing clusters.  assigned to each cluster. The computational complexity of
            (iv)  Dynamic classifiers versus thresholding: The detection   this step is O (n × d × k), and for each centroid, we calculate
               systems that are normally used are the ones that   the mean of the d-dimensional data points assigned to that
               classify using thresholding. Even though simple, this   cluster.
               has  the  disadvantage  that  the  detection  systems  fail   Therefore, the overall computational complexity of
               to be adaptive with a fixed threshold.  A successful   K-means clustering is often given as O (I × n × k × d), where
                                              25
               detection system needs to be contextual, adapting to   I is the number of iterations required for convergence.
               changing conditions. To overcome the generality of   Typically, the number of iterations is relatively small, and
               the static thresholding classifiers, dynamic classifiers,   the algorithm converges quickly, especially if the data
               which  are  based  on  ANN,  are  proposed  in  this   is well-clustered. It is only for large datasets or a large
               work.  Dynamic  classifiers,  or  leveraging  ANNs,   number of clusters that the computational complexity can
               are  advocated  for their  adaptability  to  changing   become significant, which is not the case in this work. In
               conditions, especially with a multitude of apps  addition, the initialization of centroids can also affect the
            (v)  Perceptron feed-forward neural network in K-means   computational complexity and convergence speed of the
               clustering: The focus of this research is to determine   algorithm.
               the coordinates of the centroid of every cluster in the
               K-means clustering process and to analyze its effect   4.3. System model
               on  class  imbalance.  In  determining  the  centroid  in   In this work, a hybrid classifier with an ANN, which
               K-means clustering, we propose the use of a perceptron   is a universal classifier, combined with the K-means
               feed-forward neural network. As a supervised learning   clustering method is proposed to accurately detect the
               algorithm, this method is well-known for the efficient   apps. The  proposed classifier  exploits the statistical
               handling of large amounts of data. The approach has   variation of the distinguishing features of the different
               been proposed to minimize the mean square error.  apps. The model is updated frequently by training with
              The theoretical framework underscores the need for   the zero-day apps belonging to that cluster. Hence,
            adaptive, context-aware detection systems, addressing issues   an updated model building happened in this case,
            related  to  known  and  unknown  app  classifications,  and   which reduced  the  overall error  of  the  app  detection.
            introducing innovative solutions, such as dynamic classifiers   The methodology consisted of two stages: the data
            and enhanced K-means clustering with neural networks.  preparation stage and the clustering stage. In the
                                                               preparation stage, the data set containing 30 medical
            4.2. Performance matrices                          apps was trained by an ANN to generate k initial cluster
            The evaluation utilized two similarity measures: the   centers (centroids) for the classes that are benign,
            Euclidean distance and the Manhattan distance. These   malicious, and zero-day. In this research, the generated
            measures were selected to assess their influence on the   k centroids were used as the initial cluster centers for the
            clustering outcomes across multiple iterations, the within-  K-means clustering. The architecture of the training and
            cluster sum of squared errors, and the overall model-  test phases of the proposed method with ANN followed
            building time. The computational complexity of K-means   by K-means clustering is presented in Figures 2A and B.
            clustering is determined by three primary factors: the   The modification of the K-means clustering algorithm
            number of data points (n), the number of clusters (k), and   in determining centroid using ANN is given as a flowchart
            the dimensionality of the data (d).                in Figure 3. The algorithm for calculating the initial cluster
              In each iteration of K-means, the algorithm assigns   centers (centroid) of  n objects using ANN is given in
            each data point to its nearest centroid and then updates   algorithm 1.



            Volume 1 Issue 4 (2024)                         24                               doi: 10.36922/aih.2585
   25   26   27   28   29   30   31   32   33   34   35