Page 31 - AIH-1-4
P. 31

Artificial Intelligence in Health                                Optimized clustering in medical app detection




                         A















                         B













                                      Figure 2. Proposed methodology for the (A) training and (B) test phase

              Algorithm 1 Cluster centroids using ANN          scrape this data or use APIs to retrieve information such as
            1: Procedure ANN                                   app  names, descriptions,  ratings, reviews, and  categories.
            2:  Initialize the bias and weights                In addition, some of the publicly available APIs may offer
            3:  Set the learning rate                          access to data about medical apps or health-related services.
            4:  Set input conferring to dataset                For example, APIs provided by health-care organizations
            5:  Set cluster output conferring to number of clusters from   or platforms such as HealthKit might include information
               dataset                                         about app usage, health data integration, or user interactions
            6:  repeat                                         with medical apps. Open data initiatives such as Kaggle host
            7:  for each training pair do                      various datasets contributed by users or organizations. While
            8:  Set activation of input units                  we might not find a dedicated dataset for medical apps, we
            9:  Compute response of output unit                could find related datasets containing app usage or user
            10:  Update bias and weights if an error occur for this data  behavior data that includes medical apps.
            11:  until terminating condition is true             The modified clustering algorithm is trained using
            12:  Test terminating condition                    a training dataset. Semi-supervised classification uses a
            13:  if weights are unchanged then                 significant amount of labeled data together with unlabeled
            14:  stop                                          data for classification. The training dataset was created by
            15:  else                                          considering 20 medical apps and malware and 10 unknown
            16:  Continue step 6                               benign apps and malware samples. We used the platforms
                                                               Weka and MATLAB for the whole training and validation
            5. Results and discussion                          procedures. With input data fed to the ANN, the number of
            Finding a dataset specifically focused on medical apps is very   iterations and nodes need to be specified during training.
            challenging. However, some datasets include information   Outputs with the same node numbers were assumed to
            about app usage, reviews, ratings, or features related to   be in the same cluster, resulting in intracluster similarity
            medical apps within larger repositories or platforms. Potential   being the maximum and intercluster similarity being the
            sources for finding such datasets in the proposed work include   minimum. While increasing the number of nodes in the
            platforms such as the Apple App Store and Google Play   ANN can improve performance, it also adversely affects
            Store, which offer APIs or datasets containing information   the time complexity. The maximum number of epochs in
            about various mobile apps, including medical apps. We can   this study is 1000.


            Volume 1 Issue 4 (2024)                         25                               doi: 10.36922/aih.2585
   26   27   28   29   30   31   32   33   34   35   36