Page 53 - GHES-2-1
P. 53
Global Health Econ Sustain Quantum Data Lake for epidemic analysis
1. Introduction data, overcoming data uncertainty, multidimensionality,
and disconnection with sophisticated data analytical tools
Datafication extends across all social aspects, including to obtain complex logical conclusions.
health care. The health-care sector has undergone a digital
transformation, e.g., with the development of electronic Schmarzo described the analytical “line of sight” from
health records, data-driven medical approaches, and, more data to value as a Data Lake with two types of data: Raw
recently, digital epidemiology. In the field of epidemiology, data and curated data (Schmarzo, 2022a; Schmarzo,
the task of data collection for an early epidemic or 2022b). The aim of the Data Lake is the creation of value
pandemic detection/forecast is embracing data strategy to that can constitute the decision factory. A data management
address prospective issues related to open, accessible data engine can be integrated to redirect data flows or processes
up to the creation of knowledge-based decisions (Mathur accordingly and on time. The Data Lake concept was
& Fox, 2023; Ruhamyankaka et al., 2020; Velasco, 2018; proposed in 2010 by Dixon (Dixon, 2010), and it is defined
Wang & Zhao, 2021). Digital epidemiology offers a novel as a common framework enabling the development of big
investigative approach into the “microworld” and enhances data architecture, i.e., from gathering data to computing
the fight against communicable diseases to a promising processes, including high-performance computing
new level (Cervellin et al., 2017; Höhle, 2017; Kaveh-Yazdy or quantum computing (Figure 1). The Application
& Zareh-Bidoki, 2018; Mittelstadt et al., 2018; Salathé Programming Interface (API) platform for users is based
et al., 2012; Salathé, 2018; Salathé, 2021; Samerski, 2018; on the Infrastructure as a Service approach that involves
Tarkoma et al., 2020; Velasco, 2018; Wang & Zhao, 2021). cloud orchestration technology with the opportunity for
data discovery and predictive analytics. API platforms
The COVID-19 pandemic has accelerated the are currently considered ubiquitous, facilitating the
paradigm shift in digital epidemiology. Several data-driven sharing of various web services globally. APIs can have
approaches and modeling solutions have been successfully representational state transfer architecture, simple object
developed based on raw data collection or synthetic data access protocol, and query language. Different APIs can
generation by artificial neural networks (Cao & Qing Liu, be used for epidemiological research: Google Trends API,
2021; Shakeel et al., 2021; Rahimi et al., 2021). According to Delphi’s Epidata API, WHO Athena API, Outbreak APIs,
Disease Outbreak News by the World Health Organization Data Catalog Platform “Data.world” built on a Knowledge
(WHO), numerous viral infection outbreaks were reported Graph Architecture with artificial intelligence (AI), etc.
during 2020 – 2023, including monkeypox, Middle East The Data Lake framework restructured data presentation
respiratory syndrome coronavirus, influenza A (H1N1, and utilized unstructured and non-relational data for
H1N2, H3N2, H5N1, and H5N8), Ebola hemorrhagic fever, fundamentally different types of analysis. In addition,
Marburg virus disease, Crimean-Congo hemorrhagic fever, the framework can rapidly transform the data structure,
Lassa hemorrhagic fever, Rift Valley fever, yellow fever, i.e., from raw data into curated data that are commonly
Zika virus disease, Oropouche virus disease, dengue fever, presented as clusters, principal component vectors, graphs,
Mayaro virus disease, chikungunya fever, Nipah virus fractals, logical connections, etc. Similarly, different
disease, Japanese encephalitis, enterovirus-echovirus 11 approaches can correlate data presentation to the processes
infection, hepatitis E, measles, and poliomyelitis. Data- undermining models, and this would determine the
driven approaches (e.g., modeling, neural networks, and relevance of the obtained values and subsequent decisions
open digital data) were successfully employed for outbreak made.
control and analysis. For example, data-driven analysis,
neural networking, and deep learning were used to control Despite the prospects of quantum technology, it should
the monkeypox outbreak (Edinger et al., 2023), while open be noted that the research works presented in this article
data, Google Trends, and neural networks assisted in time are limited by the present developmental phase of quantum
series forecasting (Thakur et al., 2023). technologies, which are currently at an early stage of the
new quantum era.
Big data refers to the collection of raw data that are
high in volume with a high velocity of data inflow and a 2. Methods
variety of data types. Raw data per se constitute the base
for modeling, but have minimum value in the modeling 2.1. Data source
chain. The utilization of big data models has essentially Data on DNA viruses were collected and analyzed from
established a paradigm shift from data-driven approaches databases of the International Committee on Taxonomy
to a more significant value-driven strategy. The former of Viruses (https://ictv.global/taxonomy) and the National
emphasizes qualitative data sampling, data collection, and Center for Biotechnology Information Taxonomy (NCBI).
preparation, while the latter is aimed at using all possible The detailed ontology with symptoms and diseases was
Volume 2 Issue 1 (2024) 2 https://doi.org/10.36922/ghes.2148

