Page 19 - AIH-1-3
P. 19
Artificial Intelligence in Health Optimizing EHRs to support AI
terminologies represent the same knowledge domain to structure its data repositories. The lack of widespread
but are structured and coded differently either as adoption of CDMs by EHR/EMR systems and issues
classifications, such as the International Classification with enforcement (governance) continues to be a major
of Diseases (ICD), or designed according to ontological limitation. In the health-care domain, a CDM usually
principles to ensure each code is mutually exclusive of refers to a Clinical Data Model. openEHR has adopted a
another, such as SNOMED CT (Clinical Terms), a global two-level modeling approach that separates its universal
language representing clinical terms. Not only is mapping archetypes from applications, 30,31 as a means of optimizing
time consuming and costly to maintain given frequent semantic interoperability.
terminology updates but also mapping errors introduce The lack of effective collaboration between ecosystem-
risks for patient safety. Few data maps are independently wide stakeholders, including citizens, clients, patients,
quality-assured, nor are those undertaking a mapping funders, health-care providers, researchers, and other
activity contracted to achieve a specified quality level. This institutions (data users) over time, has resulted in poor
data map quality issue has resulted in the development of an
ISO standard designed to address this data quality issue. data access and data quality. Data use is limited to built-in
24
system functionality, including reporting functionalities,
The shortcomings of using evolving messaging standards as many multi-modal systems have difficulty interfacing
to represent clinical information have long been recognized. with external systems. Consequently, meaningful data
32
The continuing use of multiple versions of messaging aggregation to create large accessible databases or to ensure
standards, which focus on syntactic interoperability, has all health data pertaining to one individual is accessible
resulted in methods and standards going beyond the data through one record is limited. These represent major
level, such as the openEHR archetypes (data models) barriers for AI development.
25
and HL7 CDA (Clinical Document Architecture) and
26
HL7 FHIR (Fast Health-care Interoperability Resources). As a consequence of poor quality and incomplete
27
Archetypes bring together relevant data items and clinical datasets, substantial research time, money, and effort is
or health-care context to define composite clinical concepts spent on “data cleansing” activities designed to improve
33
such as blood pressure, laboratory results, medication data quality. Data cleansing undertaken for medical AI
lists, and prescriptions in a manner to suit any possible systems can have negative effects on data quality if not
use case. These models may also contain terminology performed carefully. Data cleansing can have dramatic
34
34
bindings where some of the data elements are linked with harmful implications. Stöger et al. listed and described
corresponding clinical terminology, such as SNOMED the following quality problems associated with the use of
CT, ICD, or logical observation identifiers and codes original data, which data cleansing activities are meant to
(LOINC). The application of conceptual modeling plus mitigate. These are as follows:
28
attribute binding to standard terminologies ensures that • Absence of data – blank fields
context and meaning are retained to guarantee a high • Dummy/default values – may be difficult to detect
degree of semantic interoperability within and between • Noise (also known as the butterfly effect)
EHRs, significantly improving data quality. However, • Wrong data
adoption of these standards by vendors has been slow due • Inconsistent data
to a lack of effective regulatory or commercial mandates • Cryptic data
and incentives. • Duplicate primary keys
• Non-unique identifiers
2.2. Data quality and interoperability • Multipurpose fields
The value of common data models (CDM) was identified • Violation of (business) rules
during the early 1990s. The adoption of a CDM empowered Data processing sometimes requires conversion of
collaborative research across competing organizations. numerical data to strings to represent a concept in words,
29
This finding led to the establishment of the CDISC. representing another potential risk as it can lead to later
22
International collaborative research has demonstrated that issues.
semantic interoperability could be achieved by creating a
CDM shared by all data contributors as these CDMs define 2.3. Data sharing
central concepts, their attributes, constraints, and relations. This existing knowledge gap regarding data sharing and
CDM adoption allows for the pooling of information so the need for quality data needs to be acknowledged and
that meaningful comparisons can be made. addressed by policy makers and research funders as well
35
Every EHR/EMR system is a potential data contributor as by those developing AI applications. The availability of
and continues to make use of its own data reference model a public library of terminology value sets enables clinical
Volume 1 Issue 3 (2024) 13 doi: 10.36922/aih.3056

