Page 24 - AIH-1-4
P. 24
Artificial Intelligence in Health Optimized clustering in medical app detection
review, offering insights into existing knowledge on the exhibit low detection rates with real-world data containing
subject. Section 3 provides a theoretical background on numerous zero-day apps. However, the high detection rate
machine learning techniques commonly applied in app achieved with anomaly-based machine algorithms is often
detection, offering a foundational understanding of the associated with a large false alarm rate, which greatly affects
methods employed in the field. Section 4 outlines and their usability and overall performance. The unsupervised
elucidates the proposed methodology, shedding light on the machine learning algorithms are the best at detecting
innovative approach introduced in this research. Section 5 unseen and novel samples in the data. Hence, clustering
meticulously examines and discusses the results obtained, methods are usually used to detect zero-day apps. The
providing a thorough analysis of the outcomes. Finally, disadvantage of traditional clustering techniques such as
Section 6 serves as the conclusive segment, summarizing K-means is the possibility of an incorrect initial choice of
the key findings and implications derived from the study. the number of clusters, which can prevent the convergence
of the output clusters. In the K-means algorithm, deciding
2. Related works the number of clusters and determining the centroid
In the literature, the detection of medical apps primarily for each cluster are vital and often challenging tasks, as
relies on three prominent methods: the port-based they directly affect the quality of the resultant clusters.
7
approach, the payload-based approach, and the machine- Ahmad and Dey presented a modified description of
learning approach. In the port-based approach, medical cluster centers to overcome the limitation of handling
apps leverage well-known ports, as registered with IANA, only numeric data in the K-means algorithm, thereby
for easy and conventional identification. The original enhancing cluster characterization. The intended results
1
medical apps are registered with specific ports in IANA, were to overcome the limitation of K-means in dealing
and these well-known ports are advertised, facilitating with numeric data, whereby a modified description of
proper and trivial identification. However, this approach the cluster center was presented. Another approach using
has declined in popularity due to its susceptibility to fuzzy c-means has been proposed by Bezdek et al. The
8
inaccurate results caused by port obfuscation, particularly clustering results obtained were integrated into a judgment
evident in cases where peer-to-peer (P2P) apps obfuscate matrix, which was then iteratively partitioned to identify
their identity using well-known ports. 2 the desired cluster number and the result. Zhou et al.
9
When the limitations of port-based identification proposed a modified neural network backpropagation
become apparent, the payload-based approach becomes algorithm to improve detection rates, particularly in cases
crucial. This method involves monitoring the entire where there is an imbalance in the data, with the class of
3
10
packet content to identify unique and distinctive interest being a minority class. Anand et al. modified
characteristics. While the payload-based approach exhibits the placement of the clustering class to overcome the class
high classification accuracy, it faces several challenges: imbalance. Their modified backpropagation algorithm
accelerated the convergence of the neural network. Kumar
(i) Deep inspection is time-consuming, which limits et al. proposed the under-sampled K-means technique,
11
real-time detection in today’s high-speed networks. 4 effectively removing noisy and weak instances from large
(ii) The approach is ineffective with encrypted traffic, volumes of the majority class. In the work of Wu, clusters
12
allowing P2P apps to escape detection. 5 were seen to be uniform in size despite variations in input
(iii) Privacy concerns exacerbate the challenges associated data sizes.
with this approach. 6
Advanced prediction techniques and data analytics 3. Theoretical background
are increasingly employed to enhance productivity and 3.1. Need for medical apps
efficiency in detecting medical apps, moving beyond
conventional port-based and payload-based approaches. Before delving into the key factors contributing to the
This is because high-speed network connectivity and big essential nature of health-care apps, it is crucial to explore
data transfers between sensors and monitoring systems noteworthy statistics and facts that underscore the
demand the use of machine learning techniques and data industry’s growth trajectory. According to Statistica, the
analytics. These technologies contribute to cost reduction health-care sector is projected to be one of the top revenue
and minimize downtime. In the current literature, app contributors, with estimates suggesting it will increase from
13
detection leverages machine learning as a core technology $25.39 billion in 2017 to $58.8 billion by 2020. The report
to improve the detection performance of novel apps. by Research 2 Guidance indicates that there are 3,25,000
Unlike signature-based detection algorithms, which health-care apps available worldwide, with Android
struggle to identify novel or zero-day apps and often leading the way forward on the mHealth platform. A recent
Volume 1 Issue 4 (2024) 18 doi: 10.36922/aih.2585

