Page 113 - AIH-1-2
P. 113
Artificial Intelligence in Health
ORIGINAL RESEARCH ARTICLE
Development and analysis of medical
instruction-tuning for Japanese large
language models
Issey Sukeda *, Masahiro Suzuki , Hiroki Sakaji , and Satoshi Kodera 1
1
2
3
1 Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo,
Bunkyo, Tokyo, Japan
2 Department of Systems Innovation, School of Engineering, The University of Tokyo, Bunkyo, Tokyo,
Japan
3 Faculty of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan
Abstract
In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT,
the adaptation of LLMs to the medical domain has emerged as a crucial research
frontier. Since mainstream LLMs tend to be designed for general-purpose applications,
constructing a medical LLM through domain adaptation is a huge challenge. While
instruction-tuning, particularly based on low-rank adaptation (LoRA), has become a
frequently employed strategy to fine-tune LLMs recently, its precise roles in domain
adaptation remain unknown. Here, we investigated how LoRA-based instruction-
tuning improves the performance of Japanese medical question-answering tasks
*Corresponding author: by employing a multifaceted evaluation of multiple-choice questions, including
Issey Sukeda scoring based on “Exact match” and “Gestalt distance” in addition to the conventional
(sukeda-issei006@g.ecc.u-tokyo.
ac.jp) accuracy. Our findings suggest that LoRA-based instruction-tuning can partially
incorporate domain-specific knowledge into LLMs, with larger models demonstrating
Citation: Sukeda I, Suzuki M,
Sakaji H, Kodera S. Development more pronounced effects. Furthermore, our results underscore the potential of
and analysis of medical instruction- adapting English-centric models for Japanese applications in domain adaptation,
tuning for Japanese large language while also highlighting the persisting limitations of Japanese-centric models. This
models.
Artif Intell Health. 2024;1(2): 107-116. initiative represents a pioneering effort in enabling medical institutions to fine-tune
doi: 10.36922/aih.2695 and operate models without relying on external services.
Received: January 10, 2024
Accepted: March 13, 2024 Keywords: Medical large language models; Llama2; Instruction-tuning; Domain
adaptation; Low-rank adaptation; QLoRA
Published Online: April 8, 2024
Copyright: © 2024 Author(s).
This is an Open-Access article
distributed under the terms of the
Creative Commons Attribution 1. Introduction
License, permitting distribution,
and reproduction in any medium, The study and development of medical large language models (LLMs) like ChatGPT
provided the original work is have the potential to revolutionize the field of medicine and healthcare in profound
properly cited. ways. These models, when fine-tuned and adapted to the medical domain, can assist
Publisher’s Note: AccScience healthcare professionals in numerous critical tasks, such as disease diagnosis, treatment
Publishing remains neutral with planning, and patient care. Due to their vast language comprehension capabilities, LLMs
regard to jurisdictional claims in
published maps and institutional may provide up-to-date information, suggest evidence-based treatment options, and
affiliations. even predict disease outcomes with a high degree of accuracy.
Volume 1 Issue 2 (2024) 107 doi: 10.36922/aih.2695

