Page 24 - AIH-1-2

P. 24

Artificial Intelligence in Health LLMs-Healthcare: Application and challenges

treatment.” The study concluded that while ChatGPT rooted in the BI-RADS Atlas and their clinical experiences
demonstrates potential utility in clinical medicine, its within tertiary care breast imaging departments. Each
current version lacks the precision to offer specific therapy question was posed to ChatGPT three times, and three
recommendations for primary breast cancer patients. It fellowship-trained breast radiologists critically assessed
underscores the necessity for further refinement before it the responses. The radiologists categorized each response
can be a reliable adjunct in multidisciplinary tumor board as “appropriate,” “inappropriate,” or “unreliable” based
decisions. on the content’s clinical relevance and consistency. Their

Gebrael et al. assessed the utility of ChatGPT 4.0 to evaluations considered two hypothetical scenarios:
8
enhance triage efficiency and accuracy in emergency rooms content for a hospital website and direct chatbot-patient
for patients with metastatic prostate cancer. Between May interactions. The majority’s opinion dictated the final
2022 and April 2023, clinical data of 147 patients presenting determination of appropriateness. Their results revealed
with metastatic prostate cancer were examined, of which that ChatGPT provided suitable answers for 88% (22 out of
56 were selected based on inclusion criteria. ChatGPT 25) of the questions in both contexts. However, one question
demonstrated a high sensitivity of 95.7% for determining pertained to mammography scheduling in light of COVID-
patient admissions but had a low specificity of 18.2% for 19 vaccination, which elicited an inappropriate response.
discharges. It agreed with physicians’ primary diagnoses In addition, there were inconsistencies in answers
in 87.5% of cases. It outperformed physicians regarding related to breast cancer prevention and screening location
accurate terminology usage (42.9% vs. 21.4%) and queries. While ChatGPT frequently referenced guidelines
diagnosis comprehensiveness, having a median diagnosis from the American Cancer Society in its responses, it
count of 3 compared to physicians’ 2. ChatGPT was more omitted those from the American College of Radiology
concise in its responses and provided more additional and the U. S. Preventive Services Task Force. These findings
treatment recommendations than physicians. The data aligned with earlier research by Sarraju et al., where 84%
11
suggest that the ChatGPT could serve as a valuable tool of ChatGPT’s cardiovascular disease prevention responses
for assisting medical professionals in emergency room were deemed appropriate. Despite considerable potential
settings, potentially enhancing triage efficiency and the as an automated tool for patient education on breast cancer,
overall quality of patient care. ChatGPT exhibited certain limitations, emphasizing the
A study led Rao et al. investigated the potential of essential role of physician oversight and the ongoing need
9
ChatGPT-3.5 and GPT-4 (OpenAI) in aiding radiologic for further refinement and research into LLMs in health-
decision-making, specifically focusing on breast care education.
cancer screening and breast pain imaging services. The Schulte, in 2023, explored the ability of ChatGPT to
12
researchers measured the models’ responses against identify suitable treatments for advanced solid cancers.
the ACR Appropriateness Criteria using two prompt Through a structured approach, the study assessed
formats: “open-ended” (OE) and “select all that apply” ChatGPT’s capacity to list appropriate systemic therapies
(SATA). For breast cancer screening, both versions scored for newly diagnosed advanced solid malignancies and then
an average of 1.830 (out of 2) in the OE format, but compared the treatments ChatGPT suggested with those
GPT-4 outperformed ChatGPT-3.5 in the SATA format, recommended by the National Comprehensive Cancer
achieving 98.4% accuracy compared to 88.9%. Regarding Network (NCCN) guidelines. This comparison resulted in
breast pain, GPT-4 again showed superiority, registering the valid therapy quotient (VTQ) measure. The research
an average OE score of 1.666 and 77.7% in SATA, while encompassed 51 diagnoses and found that ChatGPT
ChatGPT-3.5 scored 1.125% and 58.3%, respectively. The could identify 91 unique medications related to advanced
data suggest the growing viability of LLMs like ChatGPT solid tumors. On average, the VTQ was 0.77, suggesting a
in enhancing radiologic decision-making processes, with reasonably high agreement between ChatGPT’s suggestions
potential benefits for clinical workflows and more efficient and the NCCN guidelines. Furthermore, ChatGPT always
radiological services. However, further refinement and mentioned at least one systemic therapy aligned with
broader application cases are needed for full validation. NCCN’s suggestions. However, there was a minimal

Hana et al. conducted a retrospective study to evaluate correlation between the frequency of each cancer type and
10
the appropriateness of ChatGPT’s responses to common the VTQ. In summary, while ChatGPT displays promise
questions concerning breast cancer prevention and in aligning with established oncological guidelines, its
screening. By leveraging methodologies from prior research current role in assisting medical professionals and patients
that assessed ChatGPT’s capacity to address cardiovascular in making treatment decisions still needs to be defined. As
disease-related inquiries, the team formulated 25 questions the model evolves, we are hopeful that its accuracy in this

Volume 1 Issue 2 (2024) 18 doi: 10.36922/aih.2558

19 20 21 22 23 24 25 26 27 28 29