Page 115 - AIH-1-2

P. 115

Artificial Intelligence in Health Medical instruction-tuning for Japanese LLMs

knowledge, there has been n research conducted to deepen during the generation of text. We employed LoRA, one
25
the medical specialization of Japanese-centric model. of the popular parameter-efficient fine-tuning methods
provided in PEFT library, since full fine-tuning,
7,26
3. Data and methods which retrains all model parameters, is unfeasible in our
We conducted a comprehensive comparison between environment. LoRA freezes the pretrained model weights
different LLMs fine-tuned with Japanese medical dataset, and inserts trainable rank decomposition matrices into
including those we have created ourselves. To determine each layer of the target model to reduce the number of
whether one should start from a smaller Japanese model trainable parameters for downstream tasks. Specifically,
or a larger English model, we prepared OpenCALM-7B instead of directly updating the d × k parameter matrix of
and Llama2-70B as base models. In addition, to observe a linear layer in LLM from W to W +ΔW, LoRA updates
0
0
the effectiveness of pretraining, we introduced a model a d × r matrix B and a r × k matrix A where BA is low-rank
additionally trained on medical documents. Subsequently, decomposition of ΔW, that is, r ≪min (d,k).
we applied medical instruction-tuning (LoRA, QLoRA) Given our computational constraints, particularly
to each of them and evaluated performance based on the the limited GPU memory, LoRA for OpenCALM-7B is
accuracy of medical question-answering tasks. The entire feasible, but not for Llama2-70B. Instead, we opted for
procedure is outlined in Figure 1. The models trained and the quantized version, named QLoRA, which is intended
8
used in our experiments are available at https://huggingface. to trade off a slight performance drop for a significant
co/AIgroup-CVM-utokyohospital. reduction in model size, making the experiment using

3.1. Base model preparation Llama2-70B feasible. Consequently, we applied LoRA to
OpenCALM-7B and QLoRA to Llama2-70B, respectively.
To create a Japanese-centric model, we utilized The hyperparameters of LoRA/QLoRA are listed in Table 1,
OpenCALM-7B (https://huggingface.co/cyberagent/open- which follow the default setting specified in PEFT library
calm-7b), an open-source Japanese foundation LLM with and QLoRA library, respectively. 8,26
6.5 billion parameters developed by CyberAgent, Inc. In
addition, we trained a new base model MedCALM, which To perform medical instruction-tuning, we constructed
is based on OpenCALM-7B and continually pretrained on a medical question-answer dataset containing 77422
our own medical text dataset. Here, the training dataset records in instruction format. Initially, we reviewed two
consists of 2420 examples, and the evaluation dataset has medical articles, one from the official journal of The
50 examples. The maximum token count is set to 768, Japanese Circulation Society (containing 3569 lines) and
and the batch size is set to 63. The model was trained for another from the Journal of the Japanese Society of Internal
2000 steps. On the other hand, we further used Llama2- Medicine (JJSIM, containing 6120 lines), for input retrieval.
70B-chat-hf (https://huggingface.co/meta-Llama/Llama- Then, these texts were used as inputs for ChatGPT (gpt-3.5-
2-70b-chat-hf), a powerful English-centric LLM released turbo) to generate various question-answer pairs, resulting
by Meta Inc. Hereinafter, it is referred to as Llama2-70B. in 21365 records and 56057 records, respectively. Since
24
The use of this model is governed by the Meta license ChatGPT is known to possess strong instruction-following
(https://ai.meta.com/resources/models-and-libraries/ ability, we utilized the following prompt template to
llama-downloads/). construct instruction dataset with an overall good quality:
### Instructions: You are a machine designed
3.2. Medical instruction-tuning to generate various question and answer pairs.
Instruction-tuning refers to the process of fine-tuning Please create data with question (instruction) and
or optimizing the behavior and output of the model by answer (output) pairs based on the following input,
providing explicit instructions or guidance as a prompt considering it as prior knowledge. Format the data

Figure 1. Overview of procedure of our medical instruction-tuning. Image created with Adobe Illustrator.

Volume 1 Issue 2 (2024) 109 doi: 10.36922/aih.2695

110 111 112 113 114 115 116 117 118 119 120