Page 30 - AIH-1-1
P. 30

Artificial Intelligence in Health                                                        NLP in EHR




            Table 2. (Continued)
            Article Focus            NLP level      Limitations                      Future scope
            [72]  Study of genomic-related   Lexical and semantic  Progress reports lack accurate labels  Use of pathology reports and
                  treatment changes                                                  progress notes
            [73]  Extraction of gynecological  Semantic  Manually extracted gold standard  NA
                  surgical history
            Abbreviations: CT: Computed tomography; EHR: Electronic health records; EMR: Electronic medical records; HIV: Human immunodeficiency virus;
            ICD: International classification of disease; ICU: Intensive care unit; KD-NLP: Kidney disease natural language processing; LSTM: Long short-term
            memory; NER: Named entity recognition; NLP: Natural language processing; PAD: Peripheral artery disease; PSA: Prostate-specific antigen;
            UMLS: Unified Medical Language System; NA: Not available.

            exclusively for English and was confined to a limited number   et al.  used a semantic level with a rule-based approach,
                                                                   [39]
                                            [34]
                             [33]
            of physicians. He et al.  and Becker et al. , both of whose   but the study faces limitations such as replication without a
            works have limited scope within specific departments,   positive set, confusion in the analysis of pathology reports,
            could potentially be expanded to assess the interoperability   and a scarcity of semantic relations. Expanding the study
            of the system. He  et  al.  concentrated on the Chinese   by incorporating  graph-based representation to capture
                                [33]
                                   [34]
            language, while Becker et al.  worked with the German   relations is recommended. Jones et al.  identified clinical
                                                                                             [40]
            language. The limitation posed by the availability of German   assertions of pneumonia from emergency department
            translation in these two studies arises from the fact that   notes. Limitations, including uncertainty in the pneumonia
            the Unified Medical Language System (UMLS) primarily   diagnosis, the number of false positives, and false negatives,
            supports English. In addition, the detection of negation   were acknowledged in the study. Addressing these limitations
            within statements posed a challenge in the studies. Afzal et   could contribute to the development of a more effective
            al.  successfully identified peripheral arterial disease from   decision-making support system. Goff and Loehfelm
              [35]
                                                                                                           [41]
            clinical notes, although with a focus on clinical visits. They   extracted radiology information in English using a rule-
            posited that further studies could explore the inclusion of   based approach. The study used a small set of annotations,
            clinical notes from multiple institutes to broaden the scope   and challenges included false positives and ambiguities
            of the investigation. In a related study, Afzal et al.  also   in NLP, making negation detection more difficult. The
                                                     [36]
            identified peripheral arterial disease, utilizing data retrieved   subjectivity of annotators also posed difficulties. Expanding
            exclusively from a single institute. The automatic detection   the study to encompass all diagnostic images is suggested.
            of peripheral arterial disease within EHR was acknowledged   Bai et al.  investigated both semantic and discourse levels
                                                                      [42]
            as a time-consuming process.                       of NLP. The hierarchical tree-like structure of medical codes
              Hassanpour  et al.  undertook the extraction of   was identified as a limiting factor. Future avenues for study
                              [37]
            information from radiology reports, using a combination   include cohort identification, automatic code assignment,
            of rule-based and machine-learning approaches to   and the integration of the deep neural network in prediction
                                                                                   [43]
            implement lexical and semantic levels. The study, however,   models. Chapman  et al.  conducted a study to detect
            faces challenges arising from biases in the annotation   infection from  free-text  reports  in  the  English  language,
            process, small training and test sets, and a focus limited   using only radiology reports with a higher number of
            to chest computed tomography (CT) images. Suggestively,   false positives. Generalizing the study is deemed possible
            diversifying  the  imaging  sources  beyond  chest  CT  could   by incorporating various report types, utilizing both
            provide avenues for future research. Dipaola et al.  focused   structured and unstructured data for analysis. Zhang et al.
                                                                                                           [44]
                                                  [28]
            on the abstraction of pathological data related to cancer.   explored word embeddings in psychiatric notes in English,
            The variations in the nature and structure of pathology   utilizing a neural network-based approach. Excluding
            reports underscore the need for a unique format applicable   the performance variance of word embedding vectors,
                                         [38]
            across multiple institutes. Chase et al.  conducted a study   the study could potentially be expanded by integrating a
            aimed at identifying early symptoms in patients with   combination of a case report form (CRF)-based system and
                                                                                                  [45]
            multiple sclerosis. Notably, the study’s limitations include   a deep learning-based system. Afshar et al.  performed
            restricted information sources, maintenance of notes for   a semantic-level study to identify respiratory failure,
            a limited number of patients, and the classification being   incorporating data from radiology reports and clinical
            built and tested exclusively on female patients within   notes from three intensive care units. The major limitation
                                                                                                           [46]
            correct international classification of disease (ICD)   acknowledged  is  the  single-site design.  Percha  et al.
            codes for multiple sclerosis. The applicability of the same   delved into the study of lexicons from radiology reports,
            approach to a broader range of diseases is suggested. Zeng   revealing a weak correlation between term frequencies and


            Volume 1 Issue 1 (2024)                         24                        https://doi.org/10.36922/aih.2147
   25   26   27   28   29   30   31   32   33   34   35