Page 28 - AIH-1-4
P. 28
Artificial Intelligence in Health Optimized clustering in medical app detection
methodology. The ANN is employed for its ability to learn of medical health-care apps, where the dataset may contain
complex patterns and relationships within the dataset, a significant number of instances, K-means can efficiently
making it an effective tool for classification tasks. In this handle the clustering task without excessive computational
context, the ANN acts as a fundamental element in the resources.
hybrid detector, contributing to the enhanced detection K-means is relatively simple to implement and
performance of the overall system. The K-means clustering understand compared to more complex clustering
algorithm is a widely used unsupervised machine learning algorithms. This simplicity can make it an attractive
technique employed for clustering similar data points. choice, especially if the goal is to develop an approach that
K-means clustering aims to partition the dataset into is straightforward to interpret. K-means is known for its
distinct clusters, with each cluster representing data points computational speed, particularly for low-dimensional
that share similarities. The neural network aids in fixing the
centroids of each cluster within the K-means clustering, data, making it well-suited for real-time or near-real-
contributing to superior detection performance. time apps where quick processing and response times are
important, such as in health-care settings where timely
In this study, the input to the ANN is a feature vector decision-making is crucial.
derived from data representing medical health-care apps.
The specific features used as input to the ANN depend One of K-means’ strengths lies in its effectiveness
largely on the characteristics of the apps under analysis. with spherical clusters. K-means performs well when the
Key input features considered in this study include: underlying clusters in the data are spherical or globular
in shape. In many cases, medical health-care apps may
(i) App metadata: Information such as app name, exhibit clusters that are relatively well-separated and have
description, category (e.g., medical, fitness, wellness), spherical shapes in the feature space, making K-means an
and developer information appropriate choice.
(ii) User engagement metrics: Metrics such as app ratings,
reviews, download counts, and active user counts In addition, K-means is compatible with ANN
(iii) App functionality: Features provided by the app such integration with other machine learning techniques, such as
as symptom tracking, medication reminders, and ANNs, as described in the paper. This integration allows for
telemedicine services leveraging the strengths of both approaches to enhance the
(iv) Technical characteristics: Attributes such as app size, performance of medical health-care application detection.
update frequency, and compatibility with different To address potential issues associated with K-means
platforms. clustering, such as sensitivity to initial centroids,
As the ANN used in the study is integrated with the assumptions about cluster shapes, and determining the
K-means clustering algorithm to determine the coordinates appropriate number of clusters (K), several strategies are
of the centroids, it is used as a component within the employed in this work. Firstly, instead of relying on a single
clustering process rather than for standalone classification. random initialization for the centroids, the algorithm is run
multiple times with different initializations. By averaging
The architecture of the ANN is a simple feedforward the results or selecting the best clustering solution based
neural network with one hidden layer, where the input on a predefined criterion such as the lowest within-
layer receives the feature vector representing the apps cluster variance, the risk of converging to a suboptimal
and the output layer provides coordinates or weights that solution is mitigated. In addition, domain knowledge or
influence the clustering process. The specific configuration prior information about the apps is leveraged to initialize
of the neural network, including the number of layers, the centroids in a more meaningful way. For example,
neurons per layer, activation functions, and training hierarchical clustering or density-based clustering
parameters, depends on the requirements of the clustering techniques can be used to identify initial cluster centers.
task and is determined through experimentation to
optimize performance. The choice of the K-means When K-means assumptions about cluster shapes or the
clustering algorithm in this work for medical health-care number of clusters are unknown, an alternative clustering
application detection is motivated by several factors such algorithm such as density-based spatial clustering of apps
as scalability, simplicity, speed, effectiveness for spherical with noise (DBSCAN) is considered. This method is found
clusters, and compatibility with integration with ANN, to be more flexible in handling non-spherical clusters and
making it a suitable choice for the task at hand. K-means is automatically determining the number of clusters based
known for its scalability and efficiency, making it suitable on the data structure. Hierarchical clustering is also used
for handling large datasets with many data points, even to explore different levels of granularity in the clustering
though we work with small data in this work. In the context solution, allowing for more flexibility in determining the
Volume 1 Issue 4 (2024) 22 doi: 10.36922/aih.2585

