Page 100 - AIH-2-4
P. 100

Artificial Intelligence in Health                                    AI vs humans in clinical code conversion



              The ability to convert between diagnostic coding systems   converted using ChatGPT-4o (https://chatgpt.com/).
            has  practical  applications,  particularly  within  research   Third, the same set of codes was converted using Claude
            contexts. For instance, extracting a subset of SNOMED CT   3.5 Sonnet (https://claude.ai/). Both GenAI tools required
            codes related to a specific diagnostic grouping (e.g., mental   paid subscriptions at the time of analysis.
            health) is challenging, as there are no broader categories for   The methodology and results of this study were reported
            each condition, unlike ICD codes. This presents challenges   in accordance with the METRICS reporting checklist,
            when working with large SNOMED CT datasets while   which outlines standardized reporting metrics – such as
            attempting to analyze only a subset. Converting diagnostic   model, evaluation, timing, transparency, range of tested
            codes can be a time-consuming task, particularly when this   topics, randomization, individual factors, query count,
            process relies heavily on manual data input and extraction.   and prompt specificity – for GenAI-based studies in
            To the authors’ knowledge, it remains unexplored whether   healthcare.  The completed reporting checklist is listed in
                                                                        33
            GenAI can assist in the conversion of clinical data from   Table S2.
            one diagnostic coding language to another, such as
            from SNOMED CT to ICD. Such conversions require    2.1. Phase 1: Manual conversion of SNOMED-CT-AU
            specialized  knowledge  of  clinical  coding  and  are  labor-  codes
            intensive to complete manually. Performing diagnostic   The SNOMED CT-AU codes were manually converted by
            code conversion tasks using AI models may enable less   a team of three raters (AG = 800 codes; AJ = 644 codes;
            qualified staff to complete the work in less time, thereby   CH = 532 codes). Conversions were performed using
            reducing the cost of data processing.              the Interactive Map-Assisted Generation of ICD Codes
              Therefore, this study aims to examine whether publicly   (I-MAGIC)  algorithm  (https://imagic.nlm.nih.gov/
            accessible GenAI tools – namely ChatGPT-4o and Claude   imagic/code/map), an online tool that provides a mapping
                                                                                                  34
            3.5 Sonnet – can accurately convert clinical diagnostic   between the two diagnostic coding systems.  Codes were
            codes from SNOMED CT to the 10  revision of the ICD   entered into the tool in the format “SNOMED CT-AU
                                         th
            (ICD-10). This study also seeks to address the following   name (SNOMED CT-AU code)” (e.g., “Anxiety reaction
            sub-objectives:                                    [48694002]”), and the corresponding ICD-10-CM code
            (i)  Compare the level of agreement between ChatGPT-4o   was extracted.
               and a human rater                                 In this study, the I-MAGIC tool was employed as the
            (ii)  Compare the level of agreement between Claude 3.5   reference standard against which all other conversion
               Sonnet and a human rater                        methods  were  compared.  However,  some  SNOMED
            (iii) Compare the level of agreement between ChatGPT-4o   CT codes could not be located within the I-MAGIC
               and Claude 3.5 Sonnet                           database. As the dataset utilized the Australian extension
            (iv)  Examine the economic benefit, in terms of time   of SNOMED CT, while the mapping tool used the standard
               and labor cost, of using GenAI to complete this task   SNOMED CT list, it is likely that the missing codes were
               compared to a human rater.                      region-specific.  In such cases, the absence of an equivalent
                                                                           35
            2. Materials and methods                           was noted.
            The SNOMED CT codes used in this study originate from   2.2. Phase 2: Conversion of SNOMED-CT-AU codes
            a broader emergency department (ED) dataset, obtained as   using ChatGPT-4o
            part of a study investigating mental health presentations to   ChatGPT-4o  was used to automatically convert the
                                                                         21
            hospital EDs (ethics approval: HREC/2023/QGC/95219).   SNOMED CT-AU codes and names into ICD-10-CM
            This dataset consists of 19,764 unique SNOMED-CTAU   codes (completed in August 2024). A  Microsoft Excel
            (Australian Extension) numeric codes (e.g., 48694002)   file containing the SNOMED CT-AU codes and names
            and SNOMED-CT-AU names (e.g., “Anxiety reaction”)   was uploaded to ChatGPT-4o. The prompt used
            representing the diagnoses made to the ED over a 3-year   for the conversion was refined through an iterative
            period (August 2020 to August 2023). The current   process to improve efficiency and reduce the risk of
            evaluation utilizes a randomly selected 10% subset of this   “hallucinations” (i.e., providing false information) and
            data (n = 1,976) ( Table S1).                      data processing errors.
              To convert the SNOMED CT-AU  codes to ICD-10       It was necessary to state that ChatGPT4o could take
                                          31
            Clinical Modification (ICD-10-CM),  a three-phase   as much time as required to complete this task, otherwise
                                            32
            approach was employed. First, codes were manually   the message would time out and cease to produce output.
            converted by human raters. Second, the codes were   Additionally, a limit was observed regarding the number


            Volume 2 Issue 4 (2025)                         94                          doi: 10.36922/AIH025200045
   95   96   97   98   99   100   101   102   103   104   105