Page 33 - AIH-1-4
P. 33

Artificial Intelligence in Health                                Optimized clustering in medical app detection



            computationally less intensive and reduce the risk of   refinement  contributes  to  an  overall  reduction  in  app
            overfitting, which is crucial given our limited sample size.   detection errors. By optimizing K in K-means and the
            However, we acknowledge that more complex models   number of nodes in ANN, substantial improvements in
            could potentially yield better performance and will   results are attainable.
            consider this in future work.                        To further enhance the effectiveness of health-care app
            5.3. Limitations                                   detection, future work should explore several avenues.
                                                               Real-time data capture could improve classification
            One significant limitation of our study is the small sample   accuracy in dynamic environments. Investigating
            size of 20 medical apps and malware samples, along with   advanced feature selection algorithms holds promise for
            10 unknown benign apps and malware samples. This small   achieving greater accuracy in app detection. In addition,
            sample size limits the generalizability of our findings. In   the incorporation of weighted sampling techniques in
            future work, we plan to expand our dataset to include a   training flows may provide more representative and
            more diverse and larger set of samples to validate our   effective models. The obtained results suggest that
            results further.                                   substantial improvements in the performance of health-
              Another limitation is the lack of direct comparison   care app detection are feasible. Future studies should
            with existing state-of-the-art methods for app detection.   focus  on  finding  an  optimal  method  for  determining
            While we conducted a comprehensive literature review   the number of clusters, a critical aspect of refining the
            to understand the current landscape, direct empirical   proposed scheme. In addition, extending the study
            comparisons are necessary to validate the effectiveness of   to encompass diverse health-care scenarios and data
            our approach rigorously. We aim to address this in future   sources would enhance the robustness and applicability
            studies by benchmarking our method against established   of the proposed detection model. Collecting more data is
            techniques using larger and more diverse datasets.  essential to strengthening the conclusions and reliability
                                                               of the proposed methods.
              While our proposed method demonstrates promising
            results in terms of intracluster similarity and error   Acknowledgments
            reduction, further research with larger datasets and more
            complex models is needed to fully validate its effectiveness   None.
            and generalizability.
                                                               Funding
            6. Conclusion                                      None.
            The paper has successfully addressed the challenge of
            detecting zero-day health-care apps, a prevalent issue   Conflict of interest
            where conventional app detection techniques struggle   The authors declare that they have no competing interests.
            with misclassifying zero-day traffic into predefined
            known classes. Our approach proposes a scheme that   Author contributions
            can identify zero-day apps while accurately classifying   Conceptualization: Ciza Thomas
            those belonging to predefined application classes. The   Formal Analysis: Ciza Thomas
            proposed  scheme  encompasses  three  crucial  modules:   Investigation: Ciza Thomas
            unknown discovery, app classification, and system   Methodology: All authors
            update. By leveraging ANNs to determine centroids in   Writing – original draft: All authors
            K-means clustering, our study reveals that the hybrid   Writing – review & editing: All authors
            model of K-means clustering using ANN enhances app
            detection, particularly for zero-day apps. We highlight   Ethics approval and consent to participate
            the impact of unknown apps on the classification
            accuracy  of  supervised  methods,  validating  the   Not applicable.
            effectiveness  of  correlation-based  feature  selection  for   Consent for publication
            clustering essential features. With a focus on unknown
            discovery and (N + 1) class classification, the proposed   Not applicable.
            model efficiently identifies zero-day traffic and undergoes   Availability of data
            frequent updates through training with zero-day apps
            within the respective cluster. This continuous model   Not applicable.



            Volume 1 Issue 4 (2024)                         27                               doi: 10.36922/aih.2585
   28   29   30   31   32   33   34   35   36   37   38