Page 28 - AIH-1-4
P. 28

Artificial Intelligence in Health                                Optimized clustering in medical app detection



            methodology. The ANN is employed for its ability to learn   of medical health-care apps, where the dataset may contain
            complex patterns and relationships within the dataset,   a significant number of instances, K-means can efficiently
            making it an effective tool for classification tasks. In this   handle the clustering task without excessive computational
            context, the ANN acts as a fundamental element in the   resources.
            hybrid  detector,  contributing to  the  enhanced  detection   K-means is relatively simple to implement and
            performance of the overall system. The K-means clustering   understand compared to more complex clustering
            algorithm is a widely used unsupervised machine learning   algorithms. This simplicity can make it an attractive
            technique employed for clustering similar data points.   choice, especially if the goal is to develop an approach that
            K-means  clustering  aims  to  partition  the  dataset  into   is straightforward to interpret. K-means is known for its
            distinct clusters, with each cluster representing data points   computational speed, particularly for low-dimensional
            that share similarities. The neural network aids in fixing the
            centroids of each cluster within the K-means clustering,   data, making it well-suited for real-time or near-real-
            contributing to superior detection performance.    time apps where quick processing and response times are
                                                               important, such as in health-care settings where timely
              In this study, the input to the ANN is a feature vector   decision-making is crucial.
            derived from data representing medical health-care apps.
            The specific features used as input to the ANN depend   One of K-means’ strengths lies in its effectiveness
            largely on the characteristics of the apps under analysis.   with spherical clusters. K-means performs well when the
            Key input features considered in this study include:  underlying clusters in the data are spherical or globular
                                                               in shape. In many cases, medical health-care apps may
            (i)  App metadata: Information such as app name,   exhibit clusters that are relatively well-separated and have
               description, category (e.g., medical, fitness, wellness),   spherical shapes in the feature space, making K-means an
               and developer information                       appropriate choice.
            (ii)  User engagement metrics: Metrics such as app ratings,
               reviews, download counts, and active user counts  In addition, K-means is compatible with ANN
            (iii) App functionality: Features provided by the app such   integration with other machine learning techniques, such as
               as symptom tracking, medication reminders, and   ANNs, as described in the paper. This integration allows for
               telemedicine services                           leveraging the strengths of both approaches to enhance the
            (iv)  Technical characteristics: Attributes such as app size,   performance of medical health-care application detection.
               update frequency, and compatibility with different   To  address potential  issues associated  with K-means
               platforms.                                      clustering, such as sensitivity to initial centroids,
              As the ANN used in the study is integrated with the   assumptions about cluster shapes, and determining the
            K-means clustering algorithm to determine the coordinates   appropriate number of clusters (K), several strategies are
            of the centroids, it is used as a component within the   employed in this work. Firstly, instead of relying on a single
            clustering process rather than for standalone classification.  random initialization for the centroids, the algorithm is run
                                                               multiple times with different initializations. By averaging
              The architecture of the ANN is a simple feedforward   the results or selecting the best clustering solution based
            neural network with one hidden layer, where the input   on  a predefined  criterion  such  as  the  lowest  within-
            layer receives the feature vector representing the apps   cluster variance, the risk of converging to a suboptimal
            and the output layer provides coordinates or weights that   solution is mitigated. In addition, domain knowledge or
            influence the clustering process. The specific configuration   prior information about the apps is leveraged to initialize
            of the neural network, including the number of layers,   the centroids in a more meaningful way. For example,
            neurons per layer, activation functions, and training   hierarchical clustering or density-based clustering
            parameters, depends on the requirements of the clustering   techniques can be used to identify initial cluster centers.
            task and is determined through experimentation to
            optimize performance. The choice of the K-means      When K-means assumptions about cluster shapes or the
            clustering algorithm in this work for medical health-care   number of clusters are unknown, an alternative clustering
            application detection is motivated by several factors such   algorithm such as density-based spatial clustering of apps
            as scalability, simplicity, speed, effectiveness for spherical   with noise (DBSCAN) is considered. This method is found
            clusters, and compatibility with integration with ANN,   to be more flexible in handling non-spherical clusters and
            making it a suitable choice for the task at hand. K-means is   automatically determining the number of clusters based
            known for its scalability and efficiency, making it suitable   on the data structure. Hierarchical clustering is also used
            for handling large datasets with many data points, even   to explore different levels of granularity in the clustering
            though we work with small data in this work. In the context   solution, allowing for more flexibility in determining the


            Volume 1 Issue 4 (2024)                         22                               doi: 10.36922/aih.2585
   23   24   25   26   27   28   29   30   31   32   33