Page 30 - AIH-1-1

P. 30

Artificial Intelligence in Health NLP in EHR

Table 2. (Continued)
Article Focus NLP level Limitations Future scope
[72] Study of genomic-related Lexical and semantic Progress reports lack accurate labels Use of pathology reports and
treatment changes progress notes
[73] Extraction of gynecological Semantic Manually extracted gold standard NA
surgical history
Abbreviations: CT: Computed tomography; EHR: Electronic health records; EMR: Electronic medical records; HIV: Human immunodeficiency virus;
ICD: International classification of disease; ICU: Intensive care unit; KD-NLP: Kidney disease natural language processing; LSTM: Long short-term
memory; NER: Named entity recognition; NLP: Natural language processing; PAD: Peripheral artery disease; PSA: Prostate-specific antigen;
UMLS: Unified Medical Language System; NA: Not available.

exclusively for English and was confined to a limited number et al. used a semantic level with a rule-based approach,
[39]
[34]
[33]
of physicians. He et al. and Becker et al. , both of whose but the study faces limitations such as replication without a
works have limited scope within specific departments, positive set, confusion in the analysis of pathology reports,
could potentially be expanded to assess the interoperability and a scarcity of semantic relations. Expanding the study
of the system. He et al. concentrated on the Chinese by incorporating graph-based representation to capture
[33]
[34]
language, while Becker et al. worked with the German relations is recommended. Jones et al. identified clinical
[40]
language. The limitation posed by the availability of German assertions of pneumonia from emergency department
translation in these two studies arises from the fact that notes. Limitations, including uncertainty in the pneumonia
the Unified Medical Language System (UMLS) primarily diagnosis, the number of false positives, and false negatives,
supports English. In addition, the detection of negation were acknowledged in the study. Addressing these limitations
within statements posed a challenge in the studies. Afzal et could contribute to the development of a more effective
al. successfully identified peripheral arterial disease from decision-making support system. Goff and Loehfelm
[35]
[41]
clinical notes, although with a focus on clinical visits. They extracted radiology information in English using a rule-
posited that further studies could explore the inclusion of based approach. The study used a small set of annotations,
clinical notes from multiple institutes to broaden the scope and challenges included false positives and ambiguities
of the investigation. In a related study, Afzal et al. also in NLP, making negation detection more difficult. The
[36]
identified peripheral arterial disease, utilizing data retrieved subjectivity of annotators also posed difficulties. Expanding
exclusively from a single institute. The automatic detection the study to encompass all diagnostic images is suggested.
of peripheral arterial disease within EHR was acknowledged Bai et al. investigated both semantic and discourse levels
[42]
as a time-consuming process. of NLP. The hierarchical tree-like structure of medical codes
Hassanpour et al. undertook the extraction of was identified as a limiting factor. Future avenues for study
[37]
information from radiology reports, using a combination include cohort identification, automatic code assignment,
of rule-based and machine-learning approaches to and the integration of the deep neural network in prediction
[43]
implement lexical and semantic levels. The study, however, models. Chapman et al. conducted a study to detect
faces challenges arising from biases in the annotation infection from free-text reports in the English language,
process, small training and test sets, and a focus limited using only radiology reports with a higher number of
to chest computed tomography (CT) images. Suggestively, false positives. Generalizing the study is deemed possible
diversifying the imaging sources beyond chest CT could by incorporating various report types, utilizing both
provide avenues for future research. Dipaola et al. focused structured and unstructured data for analysis. Zhang et al.
[44]
[28]
on the abstraction of pathological data related to cancer. explored word embeddings in psychiatric notes in English,
The variations in the nature and structure of pathology utilizing a neural network-based approach. Excluding
reports underscore the need for a unique format applicable the performance variance of word embedding vectors,
[38]
across multiple institutes. Chase et al. conducted a study the study could potentially be expanded by integrating a
aimed at identifying early symptoms in patients with combination of a case report form (CRF)-based system and
[45]
multiple sclerosis. Notably, the study’s limitations include a deep learning-based system. Afshar et al. performed
restricted information sources, maintenance of notes for a semantic-level study to identify respiratory failure,
a limited number of patients, and the classification being incorporating data from radiology reports and clinical
built and tested exclusively on female patients within notes from three intensive care units. The major limitation
[46]
correct international classification of disease (ICD) acknowledged is the single-site design. Percha et al.
codes for multiple sclerosis. The applicability of the same delved into the study of lexicons from radiology reports,
approach to a broader range of diseases is suggested. Zeng revealing a weak correlation between term frequencies and

Volume 1 Issue 1 (2024) 24 https://doi.org/10.36922/aih.2147

25 26 27 28 29 30 31 32 33 34 35