Page 29 - AIH-1-2
P. 29

Artificial Intelligence in Health                                LLMs-Healthcare: Application and challenges



            is gaining momentum. This new direction aims to shift   to multiple-choice questions about nephrology. Wu et al.
                                                                                                            34
            mental health assessments from traditional rating scales   incorporated questions regarding clinical backgrounds
            to a more natural, language-based communication.   linked to 858 nephSAP multiple-choice queries collated
            The emergence of LLMS, like those powering ChatGPT   between 2016 and 2023. When evaluating the proficiency
            and  BERT, marks a  significant  shift  in  AI, potentially   of Claude 2 and GPT-4, performance was gauged based on
            revolutionizing standardized psychological assessments.   the proportion of correctly answered nephrology-related
            This evidence points toward AI’s capacity to transform   nephSAP multiple-choice questions. GPT-4 demonstrated
            mental health evaluations into interactions that mirror   superior performance, garnering a score of 73.3%, in
            natural  human communication,  pending comprehensive   contrast to Claude 2, which achieved a score of 54.4%.
            validation in specific application scenarios. 33   When individual nephrology topics were examined, GPT-4
                                                               consistently  outperformed its  counterparts,  including
            6.1. Challenges associated with applications of LLMs   Claude 2, Vuna, Kaola, Orca-mini, and Falcon.
            for mental health
            In mental health applications, LLMs face challenges like   7.2. Gastroenterology
            ensuring content sensitivity and safety to avoid generating   Lahat et al.  explored the capabilities of LLMS, specifically
                                                                        35
            inappropriate and harmful advice, maintaining accuracy   OpenAI’s ChatGPT, in responding to queries within
            and reliability to prevent misdiagnoses, and offering   the realm of gastrointestinal health. Their evaluation
            personalized, empathetic responses for adequate support.   employed 110 real-world  questions, benchmarking
            Data privacy and security are paramount due to the   ChatGPT’s responses against the expert consensus of
            personal nature of discussions. There is also a need to   seasoned gastroenterologists. These queries spanned a
            prevent user over-reliance on LLMs, which might lead to   spectrum of topics, from diagnostic tests and prevalent
            a delay in seeking professional help. Ethical considerations   symptoms  to  treatments  for  a  range  of  gastrointestinal
            include the impact of replacing human interactions with   issues. The source of these questions was public internet
            AI and avoiding biases. In addition, navigating regulatory   platforms. The researchers evaluated the outputs of
            compliance within mental health laws and guidelines  is   ChatGPT  on  metrics  such  as  accuracy,  clarity,  up-to-
            crucial for lawful operation.                      dateness, and efficacy, rating them on a scale from 1 to
                                                               5. These outputs were then categorized into symptoms,
            7. Challenges other medical specialties            diagnostic tests, and treatments. ChatGPT averaged scores
            The integration of LLMs into medical specialties such   of 3.7 for clarity, 3.4 for accuracy, and 3.2 for efficacy in
            as nephrology and gastroenterology remains in the   the  symptom  category.  Diagnostic  test-related  queries
            early stages, as their full potential has yet to be realized.   resulted in scores of 3.7 for clarity, 3.7 for accuracy, and
            Current applications in these areas are sparse, highlighting   3.5 for efficacy. As for treatment-related questions, the
            opportunities for future exploration and implementation.   model achieved 3.9 for clarity, 3.9 for accuracy, and 3.3
            This brief overview aims to shed light on the existing   for efficacy. The results indicated the substantial potential
            implementations  of  LLMs  within these specific  fields,   of ChatGPT in providing valuable insights within the
            indicating the nascent but promising role of advanced   gastrointestinal specialty.
            AI technologies in enhancing diagnostic and treatment
            methodologies in nephrology and gastroenterology.  7.3. Allergy and immunology
                                                               In allergy and immunology, LLMs, akin to their applications
            7.1. Nephrology                                    in dermatology, have shown promising potential. According
            Within the domain of nephrology, LLMs are being utilized   to a study by Goktas et al.,  LLMs, specifically models like
                                                                                    36
            to assist in diagnosing kidney diseases, providing treatment   GPT-4 and Google Med-PaLM2, significantly enhance
            guidance, and monitoring renal function, as noted by Wu   the diagnostic process within allergy and immunology
            et al.  These LLMs facilitate the evaluation of crucial   disciplines. These advanced models elevate the precision of
                34
            data such as laboratory results, clinical data, and medical   diagnosis and can tailor treatment plans to suit individual
            history during the diagnostic phase. Various LLMs,   patient needs. Beyond the clinical realm, they also play
            including Orca Mini 13B, Stable Vicuna 13B, Falcon 7B,   a pivotal role in fostering patient engagement, ensuring
            Koala 7B, Claude 2, and GPT-4, have found applications in   patients  are  actively  involved  and  informed  during  the
            treating and diagnosing kidney diseases. However, due to   treatment process. As a result, the integration of LLMs
            their unique zero-shot reasoning capabilities, GPT-4 and   in allergy and immunology represents a paradigm shift
            Claude 2 are particularly suitable for this intricate medical   toward more accurate, personalized, and patient-centric
            specialty. At present, these models are employed to respond   medical care.


            Volume 1 Issue 2 (2024)                         23                               doi: 10.36922/aih.2558
   24   25   26   27   28   29   30   31   32   33   34