Page 99 - AIH-1-4
P. 99
Artificial Intelligence in Health Transformer-based radiology report summaries
language models and provide interpretable information for patients and medical professionals. Our model generates
users. state-of-the-art abstractive summarization by achieving a
ROUGE-L score of 58.75/100.
5.5. Practical implications and integration into
clinical workflows After thorough experimentation, we found that a
data-centric approach significantly improves the output
The practical implications of our findings are significant for quality of the radiology report summarization task. Our
clinical workflows. Our Biomedical-BERT2BERT model fine-tuned model may serve as a good checkpoints for
can be integrated into radiology departments to automate other NLP endeavors in the medical space dealing with
the summarization of radiology reports, thereby reducing examination reports, such as summary predictions or even
the workload on radiologists and allowing them to focus auto labeling.
on more critical tasks. This semi-automation can enhance
the efficiency of patient handoffs and triage processes Future works for this line of research include
by providing clear and concise summaries of radiology leveraging our data-centric approach combined with
findings. upsampling of the minority classes and downsampling
of the majority class to create a more balanced training
Integrating our model into clinical workflows involves set and harnessing more data-centric approaches for
several steps. First, the model can be integrated into the improving the model’s performance by addressing class
existing radiology information systems platforms to imbalance with a higher proportion of “No Finding”
automatically generate summaries as radiologists input impressions. In addition, due to computation constraints,
their findings. Second, there is a need to develop a user we did not consider experimenting with large language
interface that allows radiologists to review and edit the models such as GPT-4 and Llama 3 in the current work.
generated summaries before finalizing them. This ensures Although such models are decoder-only architectures,
accuracy and allows radiologists to add and/or edit any they learned during pre-training to understand
additional context or information. Finally, it is crucial to the context deeply, a process that makes it perform
provide training and support for radiologists and clinical tasks such as summarization and translation more
staff on using the new system, along with ongoing support effectively. Finally, incorporating human evaluation,
to address any issues or questions that arise during more specifically from radiologists, may provide deeper
implementation.
insights into the quality of the summaries generated by
By improving the efficiency and accuracy of radiology the developed model. Another direction for future work
report summarization, our model has the potential is to conduct simple baseline measurements to gain a
to significantly impact patient care. Concise and clear sense of how the model learns and then conduct a deep
summaries can help other healthcare providers quickly investigation to thoroughly understand the knowledge
understand the radiologist’s findings and make informed generated by the model.
decisions about patient management. In addition,
enhanced communication can lead to better patient Acknowledgments
outcomes by reducing the likelihood of misinterpretation The authors extend their appreciation to the Stanford
and ensuring timely interventions. University Computer Science Department for offering all
6. Conclusion the related conceptual guidance and structure in the cloud
with computing resources as well as all the staff of CS224N
In this work, we built a biomedical BERT2BERT text – NLP with deep learning course.
summarization model by performing fine-tuning with
an end-to-end deep learning approach in a data-centric Funding
fashion. The input fields for the model are COMPARISON, None.
FINDINGS, IMPRESSION, INDICATION, and
TECHNIQUE fields, producing IMPRESSION predictions Conflict of interest
as outputs through abstractive summarization method.
We believe that it will help to reduce human labor The authors declare that they have no competing interests.
resources in medical settings. We used ROUGE scores as Author contributions
an evaluation metric to capture exact word-matching in a
patient findings interpretation task that entails a high level Conceptualization: All authors
of sensitivity. We believe that slight changes in this exact Investigation: All authors
word matching with reference summaries may mislead Methodology: All authors
Volume 1 Issue 4 (2024) 93 doi: 10.36922/aih.3846

