Page 93 - AIH-1-4

P. 93

Artificial Intelligence in Health Transformer-based radiology report summaries

accessible resources that can be utilized by the wider our current work focuses on text-only data. Future work
research community to advance clinical NLP applications. could explore incorporating multimodal data to enhance
The models showed robust performance across various our model further.
tasks, although they noted limitations in de-identification Separately, a comprehensive review by Zhang et al.
12
tasks due to differences in data characteristics between recent advancements in NLP for medical text processing
training and task datasets. highlights the latest trends and future directions,
The T5 (Text-To-Text Transfer Transformer) model, contextualizing our work within the broader landscape
created by Raffel et al., frames all NLP tasks as text-to-text of NLP advancements in medical text processing. Our
8
problems. This approach allows for a unified framework contributions align with and extend these current
where both inputs and outputs are treated as text strings, trends, offering novel solutions for radiology report
simplifying the architecture and training process. T5 is pre- summarization.
trained on a large dataset (C4) and fine-tuned on various
downstream tasks, achieving state-of-the-art results across 2.1. Tokenizers
a wide range of benchmarks. The study highlights the BERT-based models have been trained on word-
versatility and efficiency of the text-to-text framework, splits tokenizers on several corpora, mainly wiki-data
demonstrating its applicability to tasks such as translation, and literature datasets in the process usually called
summarization, and question–answering. Using a tokenization. Tokenization is breaking the raw text into
consistent model structure for different tasks, T5 reduces small chunks. Tokenization breaks the raw text into
the complexity of developing task-specific models. The words and sentences called tokens. These tokens help in
success of T5 underscores the potential of transfer learning understanding the context or developing the model for the
and model unification in advancing the capabilities of NLP NLP. The tokenization helps in interpreting the meaning of
systems. the text by analyzing the sequence of the words.
Li et al. investigated the adaptation of long-sequence 2.2. Pre-trained language models (PLMs)
9
transformer models, such as Longformer and BigBird, to
clinical NLP tasks. These models address the limitations PLMs are large neural networks that are used in a wide
of traditional transformers such as BERT, which are variety of NLP tasks. They operate under a pre-train-
constrained by a maximum input sequence length of finetune paradigm: Models are first pre-trained over a large
512 tokens. By employing sparse attention mechanisms, text corpus and then fine-tuned on a downstream task using
Clinical-Longformer and Clinical-BigBird can handle additional datasets. Most common architectures, such as
8
6
sequences up to 4096 tokens, making them suitable for BERT and T5, have not been pre-trained on specialized
the lengthy documents common in clinical contexts. Their medical corpora. We have fine-tuned our model in the
study involved pre-training these models on large-scale MIMIC-CXR dataset, which is a large publicly available
clinical corpora and evaluating them on a variety of NLP dataset of chest radiographs, free-text radiology reports,
tasks, including NER, question answering, and document and structured labels.
classification. The results demonstrated that both 2.3. Evaluation metrics
Clinical-Longformer and Clinical-BigBird significantly
outperformed ClinicalBERT and other short-sequence We evaluated summarization generation performance
transformers across all tasks. This work underscores with a recall-oriented understudy for gisting evaluation, or
7
the potential of long-sequence models to improve the ROUGE on F1 metrics. Historically, ROUGE has shown
processing and analysis of extensive clinical texts, paving a good correlation with human-evaluated summaries and
the way for more effective NLP tools in health care. is a canonical metric for summarization evaluation. We
focused on a variant of ROUGE, called ROUGE-L which
The application of transformer-based models, such measures the longest common subsequence (LCS) overlap
as BERT, GPT-3, and T5, in medical text summarization between the predicted and reference summaries to evaluate
has been explored by Yalunin et al. They found that fine- the informativeness of the summary.
10
tuning these models on medical datasets significantly
improves their performance. Compared to their findings, 3. Approaches
our Biomedical-BERT2BERT model demonstrates
superior performance due to our novel data augmentation 3.1. Text summarization
techniques. Kraljevic et al. proposed a multimodal Our task, text summarization for biomedical documents,
11
approach combining text and image data for summarizing can be approached by either extractive or abstractive
medical documents. While their method shows promise, methods. Extractive summaries are snippets taken directly

Volume 1 Issue 4 (2024) 87 doi: 10.36922/aih.3846

88 89 90 91 92 93 94 95 96 97 98