Page 31 - AIH-1-1

P. 31

Artificial Intelligence in Health NLP in EHR

their synonyms. Emphasizing the expansion of methods to study is constrained by a small sample size. König et al.
[54]
other domains is deemed essential. extracted clinical events from discharge letters, employing a
Koza et al. studied lexical and semantic levels extracted combination of lexical, semantic, and morphological levels.
[47]
from radiology reports in the Spanish language. The study The study, based on data from the German language, has
the potential for application to other languages.
is limited by typing and spelling errors in the radiological
[55]
findings. Future research perspectives include addressing Oliveira et al. conducted surveillance of cervical,
negation findings, testing the methodology on a more diverse anal, and pre-cancer conditions using a combination of
[48]
corpus, and creating a more complex dictionary. Lee et al. rule-based and machine learning approaches. Notably, the
extracted polyp information from colonoscopy reports, with classification in the study was performed at the document
manual data extraction affecting the degree of accuracy. level rather than at the patient level. The reports considered
The study’s findings suggest the potential for replication in for the study originated from a single healthcare system,
[49]
other healthcare settings. Shen et al. used a combination of suggesting the possibility of expanding the study to multiple
lexical and semantic levels, focusing on surgical site infection institutes to address interoperability challenges among
[56]
from clinical notes in English related to colorectal surgery. them. Wang et al. conducted a study to recognize named
The study revealed a relatively low F1 score, indicating room entities in Chinese, using both phonetic and semantic
for improvement. Future studies could explore various levels. The study focused solely on character-based named
machine learning algorithms and sub-language supporting entity recognition (NER), leaving room for potential
techniques. Topaz et al. used a combination of rule-based exploration of word-based NER in future research.
[50]
and machine learning approaches to study the semantic level, 3.1.6. Discourse Level
processing clinical notes in English within a limited domain.
[57]
There is potential for future development by incorporating Tou et al. focused on the automatic detection of
different sources. Topaz et al. investigated neuropsychiatric infections before hospitalization using records from a
[50]
symptoms from free-text clinical notes in the English surgical emergency department in the Chinese language.
language. Due to the use of inadequate information, there Limitations of the study include the lack of Chinese
is potential to expand the symptoms category. In addition, resources and feature extraction relying on manually
since the findings are not validated, validating them would prepared wordlists. Future studies could explore a
create a scope for future research. reinforcement-based approach.
[58]
Shi et al. performed surveillance of surgical site Bozkurt et al. centered their research on lesion
[51]
infection using clinical notes in English. The study is summarization and cancer response from mammography
susceptible to mention-level and document-level errors, reports, employing a rule-based approach. The study utilized
and their removal presents avenues for future research small datasets from a single institute. Exploring larger datasets,
perspectives. Annotation errors also contribute to the study’s enhancing generalizability, and incorporating convolution
limitations. Senders et al. focused on diagnosing brain networks represent potential avenues for future scope.
[52]
metastasis based on free-text radiology reports, employing 3.1.7. Pragmatic level
a machine learning approach. The human classification
[59]
in the study introduces ambiguities, and the reliance on Doan et al. identified Kawasaki disease from emergency
data from a single institute limits interoperability. External department notes. The study encountered limitations
validation of findings, extraction of higher-level concepts, such as a limited variety of syntax, spelling errors, and
use of unsupervised machine learning approaches, and hypothetical clauses, for which the tool needs additional
automated medical text analysis are crucial aspects training. To enhance the study, future efforts could involve
[53]
guiding further studies. Misra-Hebert et al. aimed to incorporating data from several visits and conducting
detect hypoglycemia using information extracted from timestamp analysis.
clinical progress notes in English. However, the study is Most of the articles reviewed emphasized the semantic
confined to one EMR system and lacks information on the level (67%), with a relatively smaller proportion related
duration of diabetes. Wulff et al. endeavored to convert to phonetic and pragmatic levels. Annotation processes
[15]
unstructured information from the pediatric ICU of were conducted by annotators whose expertise varied,
Hannover Medical School into a structured format, using leading to the introduction of personalized biases into the
free text in the English language as the information source. annotation process. Proficiency in language, grammar,
The study’s limitation lies in the absence of a retrospective and vocabulary is essential when conducting analyses
aspect, as it did not consider the patient’s history from the at the semantic, syntactic, or lexical levels. It is worth
past month, past week, or even yesterday. In addition, the noting that tools like UMLS, while effective in English,

Volume 1 Issue 1 (2024) 25 https://doi.org/10.36922/aih.2147

26 27 28 29 30 31 32 33 34 35 36