Page 29 - AIH-1-2
P. 29
Artificial Intelligence in Health LLMs-Healthcare: Application and challenges
is gaining momentum. This new direction aims to shift to multiple-choice questions about nephrology. Wu et al.
34
mental health assessments from traditional rating scales incorporated questions regarding clinical backgrounds
to a more natural, language-based communication. linked to 858 nephSAP multiple-choice queries collated
The emergence of LLMS, like those powering ChatGPT between 2016 and 2023. When evaluating the proficiency
and BERT, marks a significant shift in AI, potentially of Claude 2 and GPT-4, performance was gauged based on
revolutionizing standardized psychological assessments. the proportion of correctly answered nephrology-related
This evidence points toward AI’s capacity to transform nephSAP multiple-choice questions. GPT-4 demonstrated
mental health evaluations into interactions that mirror superior performance, garnering a score of 73.3%, in
natural human communication, pending comprehensive contrast to Claude 2, which achieved a score of 54.4%.
validation in specific application scenarios. 33 When individual nephrology topics were examined, GPT-4
consistently outperformed its counterparts, including
6.1. Challenges associated with applications of LLMs Claude 2, Vuna, Kaola, Orca-mini, and Falcon.
for mental health
In mental health applications, LLMs face challenges like 7.2. Gastroenterology
ensuring content sensitivity and safety to avoid generating Lahat et al. explored the capabilities of LLMS, specifically
35
inappropriate and harmful advice, maintaining accuracy OpenAI’s ChatGPT, in responding to queries within
and reliability to prevent misdiagnoses, and offering the realm of gastrointestinal health. Their evaluation
personalized, empathetic responses for adequate support. employed 110 real-world questions, benchmarking
Data privacy and security are paramount due to the ChatGPT’s responses against the expert consensus of
personal nature of discussions. There is also a need to seasoned gastroenterologists. These queries spanned a
prevent user over-reliance on LLMs, which might lead to spectrum of topics, from diagnostic tests and prevalent
a delay in seeking professional help. Ethical considerations symptoms to treatments for a range of gastrointestinal
include the impact of replacing human interactions with issues. The source of these questions was public internet
AI and avoiding biases. In addition, navigating regulatory platforms. The researchers evaluated the outputs of
compliance within mental health laws and guidelines is ChatGPT on metrics such as accuracy, clarity, up-to-
crucial for lawful operation. dateness, and efficacy, rating them on a scale from 1 to
5. These outputs were then categorized into symptoms,
7. Challenges other medical specialties diagnostic tests, and treatments. ChatGPT averaged scores
The integration of LLMs into medical specialties such of 3.7 for clarity, 3.4 for accuracy, and 3.2 for efficacy in
as nephrology and gastroenterology remains in the the symptom category. Diagnostic test-related queries
early stages, as their full potential has yet to be realized. resulted in scores of 3.7 for clarity, 3.7 for accuracy, and
Current applications in these areas are sparse, highlighting 3.5 for efficacy. As for treatment-related questions, the
opportunities for future exploration and implementation. model achieved 3.9 for clarity, 3.9 for accuracy, and 3.3
This brief overview aims to shed light on the existing for efficacy. The results indicated the substantial potential
implementations of LLMs within these specific fields, of ChatGPT in providing valuable insights within the
indicating the nascent but promising role of advanced gastrointestinal specialty.
AI technologies in enhancing diagnostic and treatment
methodologies in nephrology and gastroenterology. 7.3. Allergy and immunology
In allergy and immunology, LLMs, akin to their applications
7.1. Nephrology in dermatology, have shown promising potential. According
Within the domain of nephrology, LLMs are being utilized to a study by Goktas et al., LLMs, specifically models like
36
to assist in diagnosing kidney diseases, providing treatment GPT-4 and Google Med-PaLM2, significantly enhance
guidance, and monitoring renal function, as noted by Wu the diagnostic process within allergy and immunology
et al. These LLMs facilitate the evaluation of crucial disciplines. These advanced models elevate the precision of
34
data such as laboratory results, clinical data, and medical diagnosis and can tailor treatment plans to suit individual
history during the diagnostic phase. Various LLMs, patient needs. Beyond the clinical realm, they also play
including Orca Mini 13B, Stable Vicuna 13B, Falcon 7B, a pivotal role in fostering patient engagement, ensuring
Koala 7B, Claude 2, and GPT-4, have found applications in patients are actively involved and informed during the
treating and diagnosing kidney diseases. However, due to treatment process. As a result, the integration of LLMs
their unique zero-shot reasoning capabilities, GPT-4 and in allergy and immunology represents a paradigm shift
Claude 2 are particularly suitable for this intricate medical toward more accurate, personalized, and patient-centric
specialty. At present, these models are employed to respond medical care.
Volume 1 Issue 2 (2024) 23 doi: 10.36922/aih.2558

