Page 122 - AIH-1-2
P. 122
Artificial Intelligence in Health Medical instruction-tuning for Japanese LLMs
Proceedings of the 2019 Conference on Empirical Methods Processing: System Demonstrations; 2020. p. 38-45.
in Natural Language Processing and the 9 International 34. Gao L, Tow J, Biderman S, et al. A framework for few-shot
th
Joint Conference on Natural Language Processing (EMNLP- language model evaluation. Zenodo. 2023;v0.0.1.
IJCNLP); 2019. p. 2567–2577.
doi: 10.5281/zenodo.5371629
doi: 10.18653/v1/D19-1259
35. Kurihara K, Kawahara D, Shibata T. JGLUE: Japanese General
31. Kasai J, Kasai Y, Sakaguchi K, Yamada Y, Radev D. Evaluating Language Understanding Evaluation. In: Proceedings of the
GPT-4 and ChatGPT on Japanese Medical Licensing Thirteenth Language Resources and Evaluation Conference;
Examinations. arXiv:2303.18027 [arXiv Preprint], 2023. 2022. p. 2957-2966.
doi: 10.48550/arXiv.2303.18027 36. Pezeshkpour P, Hruschka E. Large Language Models
Sensitivity to the Order of Options in Multiple-choice
32. Taori R, Gulrajani I, Zhang T, et al. Stanford Alpaca: An Questions. arXiv:2308.11483 [arXiv Preprint], 2023.
Instruction-following Llama Model; 2023. Available from:
https://github.com/tatsu-lab/stanford_alpaca [Last accessed doi: 10.48550/arXiv.2308.11483
on 2024 Apr 04]. 37. Zheng C, Zhou H, Meng F, Zhou J, Huang M. Large
33. Wolf T, Debut L, Sanh V, et al. Transformers: State-of-the- Language Models are not Robust Multiple Choice Selectors.
Art Natural Language Processing. In: Proceedings of the arXiv:2309.03882 [arXiv Preprint], 2023.
2020 Conference on Empirical Methods in Natural Language doi: 10.48550/arXiv.2309.03882
Volume 1 Issue 2 (2024) 116 doi: 10.36922/aih.2695

