Page 114 - AIH-1-2
P. 114

Artificial Intelligence in Health                                Medical instruction-tuning for Japanese LLMs



              Domain  adaptation  remains  a  crucial  approach   i.   How and how much can domain knowledge be
            for tailoring mainstream LLMs to the practical use in   incorporated into LLMs by LoRA-based fine-tuning?
            clinical environments, even after the surge of ChatGPT   ii.  Do larger English-centric LLMs outperform smaller
            (https://chat.openai.com/),  a  powerful  LLM  service,  that   Japanese-centric LLMs?
            has revolutionized the way we interact with text and   iii.  Does the amount of fine-tuning hold significance?
            language by its astonishing ability to generate sentences.   To answer these questions, we conducted a
            While these general-purpose models are powerful in zero-  comprehensive comparison between different LLMs fine-
            shot inference in unseen tasks, fine-tuned models may   tuned with our own Japanese medical dataset by evaluating
            have the potential to outperform them in domain-specific   e ach model through medical question-answering approach.
            tasks. Several works on domain adaptation within the   This enables us to clarify the strengths and limitations of
            medical field in the context of powerful English-centric   incorporating domain-specific knowledge by LoRA, setting
            LLMs  exist as well, but research in this direction is largely   the stage for constructing enhanced versions of various
                1-4
            lacking in Japanese, highlighting the need to pioneer   domain-specific Japanese LLMs.
            studies in non-English contexts. The drive to develop
            large-scale medical LLMs in one’s native language is not   2. Related works
            only prevalent in Japan but also starting to mainstream in
            other non-English-speaking countries. In Japan, the sole   In recent years, there has been active research in
            precedent in the area of Japanese medical language model   constructing  pretrained  language  models  specialized  for
                                                                                                            12
            is the work of Sugimoto et al.,  who developed a Japanese   the medical domain. Before the emergence of GPT-3
                                    5
            medical language model named JMedRoBERTa based on   in 2020 and ChatGPT in 2022, the prevailing trend in
                                                                                         6
            RoBERTa, a BERT  -based model. This study is the first   research involved building BERT  -based language models
                           6
            exploration along this line using large-scale GPT-models   and evaluating them in classification tasks. In English-
                                                                                                            14
                                                                                                 13
            with a focus on text generation.                   speaking regions, models such as BioBERT,  Med-BERT,
                                                               ClinicalBERT,  and PubMedBERT  have been proposed,
                                                                          15
                                                                                           16
              Moreover, ChatGPT utilization is impeded in clinical   leveraging medical literature databases such as PubMed
            practices due to the concerns related to data privacy   and clinical records databases such as MIMIC-III.  Also
                                                                                                        17
            and  security.  The  potential  risks  associated  with  data   in Japan, UTH-BERT  and JMedRoBERTa  have become
                                                                                                  5
                                                                                18
            breaches  or misuse of  confidential patient information   available online. UTH-BERT  is the first medical pretrained
                                                                                     18
            underscore the need for robust security measures and   language model in Japanese, pretrained by approximately
            ethical considerations, further complicating its seamless   120 million lines of clinical texts. On the other hand,
            integration into clinical settings. Hence, we need to   JMedRoBERTa  utilizes 11 million lines of journal articles
                                                                           5
            consider domain adaptation using other LLMs for    in medicine, with the goal of accumulating information
            incorporating medical knowledge.                   across a diverse range of content, encompassing basic
              Recently, several parameter-  efficient fine-tuning   research to case studies.
            methods have been proposed, including low-rank       In the wake of GPT-3  and ChatGPT emergence,
                                                                                     12
                                                         7,8
            adaptation (LoRA) and its quantized version (QLoRA),    the  focus  of  research  shifted  toward  LLMs  leveraging
            where only the limited parameters are chosen as the target   Transformer  accompanied with a steady increase in the
                                                                         19
            of the fine-tuning. Performed  along with  instruction-  parameter size of models. The primary tasks of interest
            tuning, LoRA has demonstrated some success in acquiring   in  research  also  transitioned  from  classification  tasks  to
            conversational abilities and improving domain-specific   medical text generation or medical question-answering.
            performances such as financial question-answering   For the English-centric model, BioMedLM (formerly
                9,10
            tasks.  That being said, the ability and limitation of LoRA-  known  as  PubMedGPT),   BioGPT,  and BioMedGPT
                                                                                   20
                                                                                            21
                                                                                                            22
            based instruction-tuning have not been clarified in domain   have been proposed, harnessing the strength of the latest
            adaptation. “Superficial Alignment Hypotheses,” which   general-purpose LLMs. However, the currently available
            was proposed recently, provide a conjecture that fine-  models have limited sizes: BioMedLM  has 2.7 billion
                                                                                               20
            tuning does not contribute significantly to the acquisition   parameters, BioGPT  is based on the GPT-2  architecture
                                                                               21
                                                                                                  23
            of knowledge, but this topic remains controversial.    with 1.3 billion parameters, and BioMedGPT  comprises
                                                         11
                                                                                                   22
            Therefore, we aim to investigate whether LoRA-based   10 billion parameters. On the other hand, Google has
            instruction tuning can be effective in acquiring domain-  pursued its own path in developing medical models,
            specific knowledge, especially medical knowledge.  including Med-PaLM  and Med-PaLM2  with 540 billion
                                                                                1
                                                                                               2
              The primary research questions guiding our study are   and 340 billion parameters, respectively; nonetheless, these
            as follows:                                        models are not accessible to the public. To the best of our
            Volume 1 Issue 2 (2024)                        108                               doi: 10.36922/aih.2695
   109   110   111   112   113   114   115   116   117   118   119