Page 26 - AIH-1-1
P. 26
Artificial Intelligence in Health NLP in EHR
Table 1. Levels of NLP with their focus EMR. Notably, the study did not consider images for
feature extraction, but it holds potential for prediction.
NLP levels Focus
[26]
Phonetic or Pronunciation Cai et al. extracted numerical information using a
phonological rule-based approach. The abbreviations used in the study
Morphological The smallest parts of words that carry meaning, underwent manual review. The study faces challenges
suffixes, and prefixes related to overfitting due to additions, decision-making
Lexical Lexical meaning of words and parts of speech analyses regarding variable boundaries, and differences between
Syntactic Grammar and structure of sentences the formats of clinical notes across different hospitals. The
addition of more keywords could facilitate the expansion
Semantic Meaning of words and sentences of the study to multiple hospitals. In a separate study,
Discourse Structure of different kinds of text using document Cai et al. worked on named entity recognition in
[26]
structure
Pragmatic The knowledge that stems from the outside world Chinese, conducting the study on data from two hospitals.
Thirukumaran et al. identified surgical site infections
[27]
Abbreviation: NLP: Natural language processing. from orthopedics notes, employing a rule-based approach
on data from a single institute. It is noteworthy that the
as the absence of improvement in performance with the study did not classify the infections.
use of empirical methods, the unigram model requiring
[28]
to account for unigram, negation and consideration, Dipaola et al. recognized syncope patients from
information loss due to the use of template notes, and free notes at emergency departments in Italian. Their
external validation of the model lacked interoperability. research was replicated in other languages, including the
Further, generalization of the model is conceivable. identification of patients with rheumatoid arthritis from
[29]
Workman et al. identified correct and misspelled terms free text in German . The imperative for performance
[21]
within emergency department notes using small corpora of optimization, testing with computational language experts,
surgical pathology and emergency department documents. and the implementation of encryption processes for
The methods employed in the study could potentially be clinical notes is underscored to enhance protection against
[21]
extended to other domains. security breaches.
Hanauer et al. performed a systematic review 3.1.4. Syntactic level
[22]
investigating the utilization of electronic medical records Gregg et al. focused on risk stratification in prostate
[30]
(EMR) in cancer-related research. Although only a small cancer care. The algorithm demonstrated efficacy within a
data set was used in the study, there is a potential for single institute but was limited to prostatectomy, rendering
[23]
application to a larger dataset. Baxter et al. detected fungal its applicability to the broader health system or health groups
ocular involvement from critical care records. However, limited. Clinical staging forms and electronic laboratory
the study lacked a relative assessment of sensitivity and results constituted the data sources for the study, facilitating
specificity. Challenges, including the de-identification of the identification of incidental pulmonary modules from
records, issues with queries, and regular expressions, still radiology reports through a rule-based approach. The
need to be addressed. Notably, the study was performed study’s limitations encompass the requirement of manual
on data from the critical care unit of a single institute, review in small amounts and the use of non-specific
underscoring the importance of the inclusion of positive ambiguous terminologies. To enhance performance, the
cases in future research. application of a machine learning approach holds promise.
3.1.3. Lexical level 3.1.5. Semantic level
Tao et al. automated information extraction from Nath et al. performed the extraction of information
[24]
[31]
prescriptions, employing unstructured discharge from echocardiology reports. However, the study’s
summaries as the primary data source. Limitations of inclusion/exclusion criteria introduced certain limitations,
the study include the de-identification of documents compounded by the use of data sourced solely from a
and the delineation of relationships among entities. Liu single institute. The extracted data elements held qualitative
et al. focused their study on named entity recognition value but required manual review. To enhance the
[25]
from clinical text in Chinese. The use of fuzzy feature algorithm’s applicability, consideration should be given to its
engineering affects long short-term memory, which, in implementation in a larger domain. Goldstein and Shahar ,
[32]
turn, can be applied to other specific domains. Li et al. in their work on CliniText, focused on structured data
[25]
conducted research on automatic detection using pediatric while excluding images. The system’s testing was conducted
Volume 1 Issue 1 (2024) 20 https://doi.org/10.36922/aih.2147

