Page 31 - AIH-1-4
P. 31
Artificial Intelligence in Health Optimized clustering in medical app detection
A
B
Figure 2. Proposed methodology for the (A) training and (B) test phase
Algorithm 1 Cluster centroids using ANN scrape this data or use APIs to retrieve information such as
1: Procedure ANN app names, descriptions, ratings, reviews, and categories.
2: Initialize the bias and weights In addition, some of the publicly available APIs may offer
3: Set the learning rate access to data about medical apps or health-related services.
4: Set input conferring to dataset For example, APIs provided by health-care organizations
5: Set cluster output conferring to number of clusters from or platforms such as HealthKit might include information
dataset about app usage, health data integration, or user interactions
6: repeat with medical apps. Open data initiatives such as Kaggle host
7: for each training pair do various datasets contributed by users or organizations. While
8: Set activation of input units we might not find a dedicated dataset for medical apps, we
9: Compute response of output unit could find related datasets containing app usage or user
10: Update bias and weights if an error occur for this data behavior data that includes medical apps.
11: until terminating condition is true The modified clustering algorithm is trained using
12: Test terminating condition a training dataset. Semi-supervised classification uses a
13: if weights are unchanged then significant amount of labeled data together with unlabeled
14: stop data for classification. The training dataset was created by
15: else considering 20 medical apps and malware and 10 unknown
16: Continue step 6 benign apps and malware samples. We used the platforms
Weka and MATLAB for the whole training and validation
5. Results and discussion procedures. With input data fed to the ANN, the number of
Finding a dataset specifically focused on medical apps is very iterations and nodes need to be specified during training.
challenging. However, some datasets include information Outputs with the same node numbers were assumed to
about app usage, reviews, ratings, or features related to be in the same cluster, resulting in intracluster similarity
medical apps within larger repositories or platforms. Potential being the maximum and intercluster similarity being the
sources for finding such datasets in the proposed work include minimum. While increasing the number of nodes in the
platforms such as the Apple App Store and Google Play ANN can improve performance, it also adversely affects
Store, which offer APIs or datasets containing information the time complexity. The maximum number of epochs in
about various mobile apps, including medical apps. We can this study is 1000.
Volume 1 Issue 4 (2024) 25 doi: 10.36922/aih.2585

