Page 19 - AIH-1-3
P. 19

Artificial Intelligence in Health                                           Optimizing EHRs to support AI



            terminologies represent the same knowledge domain   to structure its data repositories. The lack of widespread
            but are structured and coded differently either as   adoption of CDMs by EHR/EMR systems and issues
            classifications,  such  as  the  International  Classification   with enforcement (governance) continues to be a major
            of Diseases (ICD), or designed according to ontological   limitation. In the health-care domain, a CDM usually
            principles to ensure each code is mutually exclusive of   refers to a Clinical Data Model. openEHR has adopted a
            another, such as SNOMED CT (Clinical Terms), a global   two-level modeling approach that separates its universal
            language representing clinical terms. Not only is mapping   archetypes from applications, 30,31  as a means of optimizing
            time consuming and costly to maintain given frequent   semantic interoperability.
            terminology updates but also mapping errors introduce   The lack of effective collaboration between ecosystem-
            risks for patient safety. Few data maps are independently   wide  stakeholders,  including  citizens,  clients,  patients,
            quality-assured, nor are those undertaking a mapping   funders, health-care providers, researchers, and other
            activity contracted to achieve a specified quality level. This   institutions (data users) over time, has resulted in poor
            data map quality issue has resulted in the development of an
            ISO standard  designed to address this data quality issue.  data access and data quality. Data use is limited to built-in
                      24
                                                               system functionality, including reporting functionalities,
              The shortcomings of using evolving messaging standards   as  many multi-modal  systems  have difficulty  interfacing
            to represent clinical information have long been recognized.   with external systems.  Consequently, meaningful data
                                                                                 32
            The continuing use of multiple versions of  messaging   aggregation to create large accessible databases or to ensure
            standards, which focus on syntactic interoperability, has   all health data pertaining to one individual is accessible
            resulted in methods and standards going beyond the data   through one record is limited. These represent major
            level, such as the openEHR archetypes (data models)    barriers for AI development.
                                                         25
            and HL7 CDA (Clinical Document Architecture)  and
                                                     26
            HL7 FHIR (Fast Health-care Interoperability Resources).    As a consequence of poor quality and incomplete
                                                         27
            Archetypes bring together relevant data items and clinical   datasets,  substantial research time, money, and effort is
            or health-care context to define composite clinical concepts   spent on “data cleansing” activities designed to improve
                                                                         33
            such as blood pressure, laboratory results, medication   data quality.  Data cleansing undertaken for medical AI
            lists, and prescriptions in a manner to suit any possible   systems  can  have  negative  effects  on  data  quality  if  not
            use case. These models may also contain terminology   performed carefully.  Data cleansing can  have dramatic
                                                                                34
                                                                                           34
            bindings where some of the data elements are linked with   harmful implications.  Stöger et al.  listed and described
            corresponding clinical terminology, such as SNOMED   the following quality problems associated with the use of
            CT,  ICD,  or  logical  observation  identifiers  and  codes   original data, which data cleansing activities are meant to
            (LOINC).  The application of conceptual modeling plus   mitigate. These are as follows:
                    28
            attribute binding to standard terminologies ensures that   •   Absence of data – blank fields
            context and meaning are retained to guarantee a high   •   Dummy/default values – may be difficult to detect
            degree of semantic interoperability within and between   •   Noise (also known as the butterfly effect)
            EHRs,  significantly  improving  data  quality.  However,   •   Wrong data
            adoption of these standards by vendors has been slow due   •   Inconsistent data
            to a lack of effective regulatory or commercial mandates   •   Cryptic data
            and incentives.                                    •   Duplicate primary keys
                                                               •   Non-unique identifiers
            2.2. Data quality and interoperability             •   Multipurpose fields
            The value of common data models (CDM) was identified   •   Violation of (business) rules
            during the early 1990s. The adoption of a CDM empowered   Data processing sometimes requires conversion of
            collaborative research across competing organizations.    numerical data to strings to represent a concept in words,
                                                         29
            This finding led to the establishment of the CDISC.    representing another potential risk as it can lead to later
                                                         22
            International collaborative research has demonstrated that   issues.
            semantic interoperability could be achieved by creating a
            CDM shared by all data contributors as these CDMs define   2.3. Data sharing
            central concepts, their attributes, constraints, and relations.   This existing knowledge gap regarding data sharing and
            CDM adoption allows for the pooling of information so   the need for quality data needs to be acknowledged and
            that meaningful comparisons can be made.           addressed by policy makers and research funders  as well
                                                                                                      35
              Every EHR/EMR system is a potential data contributor   as by those developing AI applications. The availability of
            and continues to make use of its own data reference model   a public library of terminology value sets enables clinical


            Volume 1 Issue 3 (2024)                         13                               doi: 10.36922/aih.3056
   14   15   16   17   18   19   20   21   22   23   24