Page 120 - AIH-1-2
P. 120
Artificial Intelligence in Health Medical instruction-tuning for Japanese LLMs
implication suggested by the results of this experiment Conflict of interest
is that “a more powerful base model is preferable to start
with,” an overall performance improvement by upgrading The authors declare they have no competing interests.
the base model is highly expected. Author contributions
6. Conclusion Conceptualization: Issey Sukeda, Satoshi Kodera
Formal analysis: Issey Sukeda
In this paper, we explore the capabilities and limitations
of LoRA through various comparative analyses in the Investigation: Issey Sukeda, Satoshi Kodera
Methodology: Issey Sukeda, Masahiro Suzuki, Hiroki Sakaji
medical domain. LoRA-based instruction-tuning, while Writing – original draft: Issey Sukeda
avoiding an excessive number of steps, can partially
integrate domain-specific knowledge into LLMs, with Writing – review & editing: Issey Sukeda, Masahiro Suzuki,
Hiroki Sakaji
larger models demonstrating more pronounced effects.
We also observe a decrease in performance after additional Ethics approval and consent to participate
pretraining on scarce training dataset. Furthermore,
our results underscore the potential of adapting larger Not applicable.
English-centric models for Japanese applications in Consent for publication
domain adaptation, while also highlighting the persisting
limitations of Japanese-centric models including the Not applicable.
deterioration of 1-shot performance after instruction-
tuning. Our findings here suggest that, at present, the Availability of data
most promising approach in constructing a domain- Journal articles used in the study are available online in
specific LLM is applying QLoRA to larger English-centric PDFs. ChatGPT is utilized for generating and cleansing the
base models. data. IgakuQA is available online. JJSIMQA is not made
Given the current situation, the clinical translation publicly available.
of medical LLMs into real-life applications still falls Further disclosure
short of our expectations. To fully harness the potential
of medical LLMs in healthcare settings, addressing both Part of findings has been presented in Deep Generative
the performance limitations and the associated security Models for Health in NeurIPS 2023. In addition, a
and privacy concerns is imperative. Further research submission made to NeurIPS workshop is available on
and development efforts are needed to enhance the arXiv (https://doi.org/10.48550/arXiv.2310.10083).
accuracy and reliability of these models, ensuring
they meet the rigorous standards required for clinical References
decision. 1. Singhal K, Azizi S, Tu T, et al. Large language models encode
clinical knowledge. Nature. 2023;620:172-180.
Furthermore, the integration of medical LLMs
with other AI technologies, such as those utilized in doi: 10.1038/s41586-023-06291-2
electrocardiograms and electronic medical records, has 2. Singhal K, Tu T, Gottweis J, et al. Towards Expert-level
the potential to amplify their impact significantly. By Medical Question Answering with Large Language Models.
collaborating and cohesively using these AI systems arXiv:2305.09617 [arXiv Preprint], 2023.
along with medical LLMs, physicians can achieve a more doi: 10.48550/arXiv.2305.09617
comprehensive understanding of patient data, with which
they could formulate more personalized treatment plans to 3. Tu T, Azizi S, Driess D, et al. Towards generalist biomedical
ai. NEJM AI. 2024;1(3).
improve patient outcomes.
doi: 10.48550/arXiv.2307.14334
Acknowledgments 4. Wang G, Yang G, Du Z, Fan L, Li X. CLINICALGPT: Large
None. Language Models Finetuned with Diverse Medical Data
and Comprehensive Evaluation. arXiv:2306.09968 [arXiv
Funding Preprint], 2023.
This study was supported by the Japan Agency for doi: 10.48550/arXiv.2306.09968
Medical Research and Development (Grant Number: 5. Sugimoto K, Iki T, Chida Y, Kanazawa T, Aizawa A.
JP23hk0102078h0003). JMedRoBERTa: A Japanese Pre-trained Language Model
Volume 1 Issue 2 (2024) 114 doi: 10.36922/aih.2695

