Page 62 - AIH-1-3

P. 62

Artificial Intelligence in Health ChatGPT in writing scientific articles

publishing journal names. These are real sources that 3.2. Comparison of source reliability for generated
can theoretically be used in the preparation of an articles
article on the specified topics. The total number of sources used for generating scientific
(ii) Semi-reliable sources (“red”): These sources have a articles on the topic of biotelemetry in cardiology was
real title of the article but incorrect publication year, 260 (155 for ChatGPT 3.5 and 105 for ChatGPT 4).
journal, and/or authors’ names. The “red” group also For articles generated on the topic of biotelemetry in
included Internet sources (websites of biomedical oncology, the neural network produced 269 sources (157
companies, Wikipedia, and others). Such online for ChatGPT 3.5 and 112 for ChatGPT 4). For articles on
sources were actively cited by ChatGPT 4 when it was the topic of biotelemetry in remote medical examination,
unable to find other journal publications. It is worth 246 (150 for ChatGPT 3.5 and 96 for ChatGPT 4) sources
noting that all the “red” sources for this version turned
out to be Internet sources. were obtained.
(iii) Fictitious sources (“yellow”): These sources have a Source verification was carried out. Figure 2 shows the
completely fictitious title, authors, and sometimes a normalized distribution charts of the reliability of literature
fictitious journal name. sources according to the source classification described
A schematic representation of the source verification in Section 2, for each of the medical fields in which the
process and source classification is shown in Figure 1. generated research articles were analyzed.
The numerous fictitious sources for every prompt are
3. Results associated with hallucinations, a very common and critical
3.1. Analysis of the semantic content of the articles problem in the responses of language models such as
ChatGPT. 28,40,41
For each field of medicine, five different ChatGPT 3.5
responses and five different ChatGPT 4 responses were For ChatGPT 3.5, the highest total number of reliable
generated for each of the three prompts. Excerpts from sources among different medical fields was obtained when
articles on the topic of cardiology created using the third generating prompts on “biotelemetry in cardiology.” The
prompt with ChatGPT 3.5 and ChatGPT 4 are presented in highest total number of fictitious prompts was generated for
Article S1 and S2 (in Supplementary File). articles on “biotelemetry in oncology.” For ChatGPT 4, the
total number of reliable sources among different medical
In reviewing the texts of the articles generated with
ChatGPT, it can be noted that this neural network correctly fields turned out to be approximately equal. The highest
total number of fictitious prompts was also associated with
highlighted and reasoned the importance of biotelemetry articles on the topic “biotelemetry in oncology.”
in the field of cardiology. At the same time, there are no
logical and semantic errors in the text. When analyzing For the ChatGPT 4 prompts on “remote medical
these texts, it is extremely difficult to determine whether examination” and “biotelemetry in oncology,” almost all of
the author is a human or a neural network. the semi-reliable sources were based on specific Internet

Figure 1. Flowchart for source verification of ChatGPT responses. Image created with CorelDRAW 2020 (v22, Canada)

Volume 1 Issue 3 (2024) 56 doi: 10.36922/aih.2592

57 58 59 60 61 62 63 64 65 66 67