Page 107 - AIH-1-4
P. 107

Artificial Intelligence in Health                                 ChatGPT in visceral leishmaniasis diagnosis




            Table 1. (Continued)
            Case                                           Description
                  PE: Weight: 69 kg; height: 1.56 m; BP: 100/60 mmHg; abdomen: hepatomegaly 3 cm below the costal margin, splenomegaly 5 cm below the
                  costal margin, no ascites, abdominal tenderness on palpation; cardiovascular: regular cardiac rhythm, heart rate 76 bpm; respiratory: clear
                  breath sounds, no wheezes or crackles; skin/mucous membranes: no jaundice, pale, no mucosal lesions.
            Abbreviations: BP: Blood pressure; HPI: History of present illness; ID: Identifying information; PE: Physical examination; PMH: Past medical history;
            SH: Social history.

              The second investigator in this study (D.S.) employed   LLC) was presented 6 times, representing 75% of the total
            a  similar  methodology  to  that  performed  by  Hirosawa   number of cases (95% CI: 40.1 – 93.7%). Table 2 shows the
            et al.  by typing the following text into the ChatGPT (GPT   five differential diagnoses presented by ChatGPT/GPT-4
                21
            4.0, OpenAI OpCo, LLC) prompt in Brazilian Portuguese:   for each clinical vignette.
            “Please provide me with the five most likely diagnoses for   While ChatGPT/GPT-4 did not provide an accurate
            the following symptoms: (copy and paste each clinical   representation of VL as a diagnostic possibility for the clinical
            vignette).” The order of the clinical vignettes presented   vignettes containing cases 03 and 04, it did report VL as the
            to ChatGPT/GPT-4 was randomized using a computer-  top diagnosis for four cases (50.0%; 95% CI9: 30.3 – 86.5%).
            generated order table (Case 02, 08, 04, 01, 06, 05, 03, and 07).   Figure  1  shows the accuracy of ChatGPT/GPT-4 in
            To ensure the integrity of the data and to avoid any influence   presenting VL as a differential diagnosis (Figure 1A) and
            of previous interactions, each clinical vignette was presented   as the principal diagnosis (Figure 1B).
            to ChatGPT/GPT-4 only once in a new chat session. This
            approach was employed to prevent any potential influence   4. Discussion
            of previous interactions on the AI’s responses. 21
                                                               The ability of ChatGPT to provide diagnostic support,
            2.4. Measurements and definitions                  especially in resource-limited settings where access to
            The accuracy of the VL diagnosis was evaluated based on   specialized medical expertise is limited, is one of its most
            the inclusion of the correct diagnosis within the top five   promising contributions to healthcare. By providing
            differential diagnoses generated by ChatGPT (GPT 4.0,   reliable differential diagnoses, ChatGPT has the potential
            OpenAI OpCo, LLC). This approach employed a binary   to bridge gaps in medical expertise, enabling more timely
            scoring system, whereby the presence of a diagnosis in   and accurate clinical decision-making in underserved
            the list was scored as one, and its absence was scored as   areas.
            zero. Furthermore, the position of the VL diagnosis within   This  exploratory  study  evaluated  the  diagnostic
            the lists, classified between first and fifth, was analyzed   accuracy  of  ChatGPT/GPT-4  in  generating  differential
            sequentially.                                      diagnosis lists for clinical vignettes of VL. The results
                                                               showed that ChatGPT/GPT-4 correctly included VL in
            2.5. Statistical analysis                          the top five differential diagnoses in 75% of cases. Notably,
            The responses were entered into regular Excel spreadsheets   ChatGPT/GPT-4 identified VL as the top diagnosis in 50%
            (Microsoft Corporation, Redmond, WA, USA, Release   of these cases. These results indicate that ChatGPT (GPT
            12.0.6662, 2012) and exported to the Statistical Package   4.0, OpenAI OpCo, LLC) has a high potential to aid in the
            for the Social Science for Windows (SPSS Inc., Chicago,   diagnosis of VL, as evidenced by its significant accuracy in
            Illinois, USA, Release 16.0.2, 2008) for statistical analysis.   generating relevant differential diagnoses.
            Descriptive statistical analysis was performed on    The findings of our study are consistent with a growing
            categorical variables, which were presented as absolute and   body of research demonstrating the diagnostic capabilities
            relative frequencies. The accuracy of ChatGPT/GPT-4 as   of AI chatbots. For example, Hirosawa et al.  evaluated the
                                                                                                 21
            an AI-assisted diagnostic tool for VL was calculated using   diagnostic accuracy of differential diagnosis lists generated
            the prevalence ratio, and its inaccuracy was estimated using   by ChatGPT/GPT-3.5 on January 5, 2023, for clinical
            a 95% confidence interval (95% CI). Statistical analyses   vignettes with common  chief complaints. Their  results
            were conducted in a two-tailed manner, and statistical   showed that the correct diagnosis was included within the
            significance was set at P < 0.05.                  top ten differential diagnoses in 93.3% of cases. Similarly, a
                                                                                22
            3. Results                                         study by Mizuta et al.  showed that ChatGPT/GPT-4 had
                                                               an elevated level of agreement (95.9%) with physicians in
            The correct diagnosis of VL among the five differential   determining whether the correct diagnosis was included in
            diagnoses generated by ChatGPT (GPT 4.0, OpenAI OpCo,   the top ten differential diagnosis lists.


            Volume 1 Issue 4 (2024)                        101                               doi: 10.36922/aih.3930
   102   103   104   105   106   107   108   109   110   111   112