Page 53 - GHES-2-1
P. 53

Global Health Econ Sustain                                          Quantum Data Lake for epidemic analysis



            1. Introduction                                    data, overcoming data uncertainty, multidimensionality,
                                                               and disconnection with sophisticated data analytical tools
            Datafication extends across all social aspects, including   to obtain complex logical conclusions.
            health care. The health-care sector has undergone a digital
            transformation, e.g., with the development of electronic   Schmarzo described the analytical “line of sight” from
            health records, data-driven medical approaches, and, more   data to value as a Data Lake with two types of data: Raw
            recently, digital epidemiology. In the field of epidemiology,   data  and curated  data  (Schmarzo,  2022a; Schmarzo,
            the task of data collection for an early epidemic or   2022b). The aim of the Data Lake is the creation of value
            pandemic detection/forecast is embracing data strategy to   that can constitute the decision factory. A data management
            address prospective issues related to open, accessible data   engine can be integrated to redirect data flows or processes
            up to the creation of knowledge-based decisions (Mathur   accordingly and on time. The Data Lake concept was
            & Fox, 2023; Ruhamyankaka  et al., 2020; Velasco, 2018;   proposed in 2010 by Dixon (Dixon, 2010), and it is defined
            Wang & Zhao, 2021). Digital epidemiology offers a novel   as a common framework enabling the development of big
            investigative approach into the “microworld” and enhances   data architecture, i.e., from gathering data to computing
            the  fight  against  communicable  diseases  to  a  promising   processes,  including  high-performance  computing
            new level (Cervellin et al., 2017; Höhle, 2017; Kaveh-Yazdy   or quantum computing (Figure  1). The Application
            & Zareh-Bidoki, 2018; Mittelstadt  et al., 2018; Salathé   Programming Interface (API) platform for users is based
            et al., 2012; Salathé, 2018; Salathé, 2021; Samerski, 2018;   on the Infrastructure as a Service approach that involves
            Tarkoma et al., 2020; Velasco, 2018; Wang & Zhao, 2021).  cloud orchestration technology with the opportunity for
                                                               data discovery and predictive analytics. API platforms
              The COVID-19 pandemic has accelerated the        are currently considered ubiquitous, facilitating the
            paradigm shift in digital epidemiology. Several data-driven   sharing  of  various  web  services  globally.  APIs  can have
            approaches and modeling solutions have been successfully   representational state transfer architecture, simple object
            developed based on raw data collection or synthetic data   access protocol, and query language. Different APIs can
            generation by artificial neural networks (Cao & Qing Liu,   be used for epidemiological research: Google Trends API,
            2021; Shakeel et al., 2021; Rahimi et al., 2021). According to   Delphi’s Epidata API, WHO Athena API, Outbreak APIs,
            Disease Outbreak News by the World Health Organization   Data Catalog Platform “Data.world” built on a Knowledge
            (WHO), numerous viral infection outbreaks were reported   Graph Architecture with artificial intelligence (AI), etc.
            during 2020 – 2023, including monkeypox, Middle East   The Data Lake framework restructured data presentation
            respiratory syndrome coronavirus, influenza A (H1N1,   and utilized unstructured and non-relational data for
            H1N2, H3N2, H5N1, and H5N8), Ebola hemorrhagic fever,   fundamentally different types of analysis. In addition,
            Marburg virus disease, Crimean-Congo hemorrhagic fever,   the framework can rapidly transform the data structure,
            Lassa  hemorrhagic  fever,  Rift  Valley  fever,  yellow  fever,   i.e., from raw data into curated data that are commonly
            Zika virus disease, Oropouche virus disease, dengue fever,   presented as clusters, principal component vectors, graphs,
            Mayaro virus disease, chikungunya fever, Nipah virus   fractals, logical connections, etc. Similarly, different
            disease, Japanese encephalitis, enterovirus-echovirus 11   approaches can correlate data presentation to the processes
            infection, hepatitis E, measles, and poliomyelitis. Data-  undermining models, and  this  would determine  the
            driven approaches (e.g., modeling, neural networks, and   relevance of the obtained values and subsequent decisions
            open digital data) were successfully employed for outbreak   made.
            control and analysis. For example, data-driven analysis,
            neural networking, and deep learning were used to control   Despite the prospects of quantum technology, it should
            the monkeypox outbreak (Edinger et al., 2023), while open   be noted that the research works presented in this article
            data, Google Trends, and neural networks assisted in time   are limited by the present developmental phase of quantum
            series forecasting (Thakur et al., 2023).          technologies, which are currently at an early stage of the
                                                               new quantum era.
              Big data refers to the collection of raw data that are
            high in volume with a high velocity of data inflow and a   2. Methods
            variety of data types. Raw data per se constitute the base
            for modeling, but have minimum value in the modeling   2.1. Data source
            chain. The utilization of big data models has essentially   Data on DNA viruses were collected and analyzed from
            established a paradigm shift from data-driven approaches   databases of the International Committee on Taxonomy
            to  a  more  significant  value-driven  strategy.  The  former   of Viruses (https://ictv.global/taxonomy) and the National
            emphasizes qualitative data sampling, data collection, and   Center for Biotechnology Information Taxonomy (NCBI).
            preparation, while the latter is aimed at using all possible   The detailed ontology with symptoms and diseases was


            Volume 2 Issue 1 (2024)                         2                        https://doi.org/10.36922/ghes.2148
   48   49   50   51   52   53   54   55   56   57   58