Page 24 - AIH-1-2
P. 24

Artificial Intelligence in Health                                LLMs-Healthcare: Application and challenges



            treatment.” The study concluded that while ChatGPT   rooted in the BI-RADS Atlas and their clinical experiences
            demonstrates potential utility in clinical medicine, its   within tertiary care breast imaging departments. Each
            current version lacks the precision to offer specific therapy   question  was  posed to  ChatGPT three  times,  and three
            recommendations  for primary breast  cancer  patients. It   fellowship-trained breast radiologists critically assessed
            underscores the necessity for further refinement before it   the responses. The radiologists categorized each response
            can be a reliable adjunct in multidisciplinary tumor board   as  “appropriate,”  “inappropriate,”  or  “unreliable”  based
            decisions.                                         on  the  content’s  clinical  relevance  and  consistency.  Their

              Gebrael et al.  assessed the utility of ChatGPT 4.0 to   evaluations  considered two  hypothetical  scenarios:
                         8
            enhance triage efficiency and accuracy in emergency rooms   content for a hospital website and direct chatbot-patient
            for patients with metastatic prostate cancer. Between May   interactions. The  majority’s  opinion  dictated the final
            2022 and April 2023, clinical data of 147 patients presenting   determination of appropriateness. Their results revealed
            with metastatic prostate cancer were examined, of which   that ChatGPT provided suitable answers for 88% (22 out of
            56 were selected based on inclusion criteria. ChatGPT   25) of the questions in both contexts. However, one question
            demonstrated a high sensitivity of 95.7% for determining   pertained to mammography scheduling in light of COVID-
            patient admissions but had a low specificity of 18.2% for   19 vaccination, which elicited an inappropriate response.
            discharges. It agreed with physicians’ primary diagnoses   In addition, there were inconsistencies in answers
            in 87.5% of cases. It outperformed physicians regarding   related to breast cancer prevention and screening location
            accurate terminology usage (42.9% vs. 21.4%) and   queries. While ChatGPT frequently referenced guidelines
            diagnosis comprehensiveness, having a median diagnosis   from the American Cancer Society in its responses, it
            count of 3 compared to physicians’ 2. ChatGPT was more   omitted those from the American College of Radiology
            concise in its responses and provided more additional   and the U. S. Preventive Services Task Force. These findings
            treatment  recommendations  than  physicians.  The  data   aligned with earlier research by Sarraju et al.,  where 84%
                                                                                                   11
            suggest that the ChatGPT could serve as a valuable tool   of ChatGPT’s cardiovascular disease prevention responses
            for assisting medical professionals in emergency room   were deemed appropriate. Despite considerable potential
            settings, potentially enhancing triage efficiency and the   as an automated tool for patient education on breast cancer,
            overall quality of patient care.                   ChatGPT exhibited certain limitations, emphasizing the
              A study led Rao  et al.  investigated the potential of   essential role of physician oversight and the ongoing need
                                 9
            ChatGPT-3.5 and GPT-4 (OpenAI) in aiding radiologic   for further refinement and research into LLMs in health-
            decision-making,  specifically  focusing  on  breast  care education.
            cancer  screening  and  breast  pain  imaging  services.  The   Schulte,  in 2023, explored the ability of ChatGPT to
                                                                        12
            researchers  measured  the  models’  responses  against   identify suitable  treatments  for advanced solid  cancers.
            the ACR Appropriateness Criteria using two prompt   Through a structured approach, the study assessed
            formats: “open-ended” (OE) and “select all that apply”   ChatGPT’s capacity to list appropriate systemic therapies
            (SATA). For breast cancer screening, both versions scored   for newly diagnosed advanced solid malignancies and then
            an average of 1.830 (out of 2) in the OE format, but   compared the treatments ChatGPT suggested with those
            GPT-4  outperformed ChatGPT-3.5  in  the SATA  format,   recommended by the National Comprehensive Cancer
            achieving 98.4% accuracy compared to 88.9%. Regarding   Network (NCCN) guidelines. This comparison resulted in
            breast pain, GPT-4 again showed superiority, registering   the valid therapy quotient (VTQ) measure. The research
            an average OE score of 1.666 and 77.7% in SATA, while   encompassed 51 diagnoses and found that ChatGPT
            ChatGPT-3.5 scored 1.125% and 58.3%, respectively. The   could identify 91 unique medications related to advanced
            data suggest the growing viability of LLMs like ChatGPT   solid tumors. On average, the VTQ was 0.77, suggesting a
            in enhancing radiologic decision-making processes, with   reasonably high agreement between ChatGPT’s suggestions
            potential benefits for clinical workflows and more efficient   and the NCCN guidelines. Furthermore, ChatGPT always
            radiological services. However, further refinement and   mentioned at least one systemic therapy aligned with
            broader application cases are needed for full validation.  NCCN’s suggestions. However, there was a minimal

              Hana et al.  conducted a retrospective study to evaluate   correlation between the frequency of each cancer type and
                       10
            the appropriateness of ChatGPT’s responses to common   the VTQ. In summary, while ChatGPT displays promise
            questions  concerning  breast  cancer  prevention  and   in  aligning  with established oncological  guidelines, its
            screening. By leveraging methodologies from prior research   current role in assisting medical professionals and patients
            that assessed ChatGPT’s capacity to address cardiovascular   in making treatment decisions still needs to be defined. As
            disease-related inquiries, the team formulated 25 questions   the model evolves, we are hopeful that its accuracy in this


            Volume 1 Issue 2 (2024)                         18                               doi: 10.36922/aih.2558
   19   20   21   22   23   24   25   26   27   28   29