Page 99 - AIH-1-4
P. 99

Artificial Intelligence in Health                               Transformer-based radiology report summaries



            language models and provide interpretable information for   patients and medical professionals. Our model generates
            users.                                             state-of-the-art abstractive summarization by achieving a
                                                               ROUGE-L score of 58.75/100.
            5.5. Practical implications and integration into
            clinical workflows                                   After thorough experimentation, we found that a
                                                               data-centric approach significantly improves the output
            The practical implications of our findings are significant for   quality of the radiology report summarization task. Our
            clinical workflows. Our Biomedical-BERT2BERT model   fine-tuned model may serve as a good checkpoints for
            can be integrated into radiology departments to automate   other NLP endeavors in the medical space dealing with
            the summarization of radiology reports, thereby reducing   examination reports, such as summary predictions or even
            the workload on radiologists and allowing them to focus   auto labeling.
            on more critical tasks. This semi-automation can enhance
            the efficiency of patient handoffs and triage processes   Future works for this line of research include
            by providing clear and concise summaries of radiology   leveraging  our  data-centric  approach  combined  with
            findings.                                          upsampling of the minority classes and downsampling
                                                               of the majority class to create a more balanced training
              Integrating our model into clinical workflows involves   set and harnessing more data-centric approaches for
            several steps. First, the model can be integrated into the   improving the model’s performance by addressing class
            existing radiology information systems platforms to   imbalance  with  a  higher  proportion  of  “No  Finding”
            automatically generate summaries as radiologists input   impressions. In addition, due to computation constraints,
            their findings. Second, there is a need to develop a user   we did not consider experimenting with large language
            interface that allows radiologists to review and edit the   models such as GPT-4 and Llama 3 in the current work.
            generated summaries before finalizing them. This ensures   Although such models are decoder-only architectures,
            accuracy and allows radiologists to add and/or edit any   they learned during pre-training to understand
            additional context or information. Finally, it is crucial to   the context deeply, a process that makes it perform
            provide training and support for radiologists and clinical   tasks such as summarization and translation more
            staff on using the new system, along with ongoing support   effectively. Finally, incorporating human evaluation,
            to address any issues or questions that arise during   more specifically from radiologists, may provide deeper
            implementation.
                                                               insights into the quality of the summaries generated by
              By improving the efficiency and accuracy of radiology   the developed model. Another direction for future work
            report summarization, our model has the potential   is to conduct simple baseline measurements to gain a
            to  significantly  impact  patient  care.  Concise  and  clear   sense of how the model learns and then conduct a deep
            summaries can help other healthcare providers quickly   investigation to thoroughly understand the knowledge
            understand the radiologist’s findings and make informed   generated by the model.
            decisions about patient management. In addition,
            enhanced  communication can lead  to better patient   Acknowledgments
            outcomes by reducing the likelihood of misinterpretation   The  authors  extend  their  appreciation  to  the  Stanford
            and ensuring timely interventions.                 University Computer Science Department for offering all
            6. Conclusion                                      the related conceptual guidance and structure in the cloud
                                                               with computing resources as well as all the staff of CS224N
            In this work, we built a biomedical BERT2BERT text   – NLP with deep learning course.
            summarization model by performing fine-tuning with
            an end-to-end deep learning approach in a data-centric   Funding
            fashion. The input fields for the model are COMPARISON,   None.
            FINDINGS,    IMPRESSION,    INDICATION,    and
            TECHNIQUE fields, producing IMPRESSION predictions   Conflict of interest
            as outputs through abstractive summarization method.
            We believe that it will help to reduce human labor   The authors declare that they have no competing interests.
            resources in medical settings. We used ROUGE scores as   Author contributions
            an evaluation metric to capture exact word-matching in a
            patient findings interpretation task that entails a high level   Conceptualization: All authors
            of sensitivity. We believe that slight changes in this exact   Investigation: All authors
            word matching with reference  summaries  may mislead   Methodology: All authors


            Volume 1 Issue 4 (2024)                         93                               doi: 10.36922/aih.3846
   94   95   96   97   98   99   100   101   102   103   104