Page 110 - AIH-1-2
P. 110

Artificial Intelligence in Health                                 Schema-less text2sql conversion with LLMs



            models against our fine-tuned Flan-T5 models, which were   In the first example, the natural language query is
            explicitly trained on the MIMICSQL dataset.        accurately transformed into SQL, selecting the count of
              Comparison of our fine-tuned Flan-T5 models with   distinct patient IDs from the DEMOGRAPHIC table where
            baseline TREQS, as  well as other  LLMs such as  Defog-  the corresponding HADM_ID in the DIAGNOSES table
            SQLCoder, LLaMA-2-7B, GPT-3.5, and GPT-4, highlights   matches the specified ICD9 code “28803.” This showcases
            the significant impact of compact LLMs and illustrates the   Flan-T5’s ability to understand and incorporate complex
            effectiveness of schema-less querying. A key finding is the   medical terminology, such as ICD9 codes, into SQL
            ability of smaller models, such as Flan-T5, to generate SQL   queries effectively. While in the second example, the model
            queries directly from natural language questions without   accurately generates an SQL query that selects the count of
            relying on explicit database  schema information. This   distinct patient IDs from the DEMOGRAPHIC table based
            approach challenges the traditional  schema-dependent   on the presence of the specific drug code “NEOLEVO5”
            paradigm. Furthermore, as illustrated in the Table 6, our   in the PRESCRIPTIONS  table. Despite the unfamiliar
            study emphasizes the importance of model competence.   term  “neolevo5,”  Flan-T5 effectively incorporates  it into
            The Flan-T5 models demonstrated proficiency in     the SQL query, showcasing its ability to comprehend and
            interpreting complex medical terminology, even when   handle diverse medical and pharmaceutical terminology.
            certain terms were absent from the model’s vocabulary.  This capability is vital in healthcare, where accuracy and
                                                               domain-specific language are paramount. Our findings
            Table 5. Comparison with current state‑of‑the‑art models in   encourage further exploration into strategies that reduce
            the test set                                       model and data complexity, focusing on task-oriented
                                                               models suitable for commodity hardware.
            Model                  Parameter size      LFA
            TREQS                     2.8 M            0.48    5. Conclusion
            TREQS+Recover             2.8 M            0.55    This study investigates the efficacy of smaller, task-specific
            GPT-3.5-Turbo              20 B            0.60    language models compared to more complex LLMs in the
            LLAMA-2-7B                 7 B             0.60    Text-to-SQL task, with a focus on the healthcare domain
            Defog-SQLCoder             15 B            0.65    using the MIMICSQL dataset. Our findings reveal the
            GPT-4                       -              0.70    remarkable performance of the fine-tuned Flan-T5 models,
            Flan-T5 Base (Ours)       220 M            0.56    particularly Flan-T5 Large, which achieved an LFA score of
                                                               0.85. This score surpasses the current state-of-the-art model,
            Flan-T5 Large (Ours)      770 M            0.85    Defog-SQLCoder, as well as other advanced LLMs such as
            Notes: Flan-T5 Large outperforms all other models. The parameter size   LLaMA-2-7B, GPT-3.5-Turbo, and GPT-4. Our approach,
            of GPT-4 has not been disclosed by OpenAI. The boldface indicates the   advocating for the removal of schema definitions from input
            best model with the highest score achieved compared to other models,
            indicating its superior performance in the LFA metric.  prompts and training separate models for distinct schemas,
            Abbreviations: M: Million; B: Billion.             has proven effective, requiring less hardware resources and
                                                               data for training. These findings underscore the potential
            Table 6. Examples demonstrating proficiency in interpreting   of tailored compact language models for domain-specific
            complex medical terminology using our fine‑tuned Flan‑T5   applications, opening avenues for more efficient and effective
            Large model                                        natural language understanding in specialized contexts.
            Input question  SQL query generated by Flan‑T5 Large model  Acknowledgments
            How many    SELECT COUNT (DISTINCT
            patients have   DEMOGRAPHIC."SUBJECT_ID")          None.
            diagnoses icd9   FROM DEMOGRAPHIC
            code 28803?  INNER JOIN DIAGNOSES ON DEMOGRAPHIC.  Funding
                        HADM_ID=DIAGNOSES.HADM_ID
                        WHERE DIAGNOSES."ICD9_CODE" = '28803';  None.
            Which patients   SELECT COUNT (DISTINCT            Conflict of interest
            have a neolevo5   DEMOGRAPHIC."SUBJECT_ID")
            drug code?  FROM DEMOGRAPHIC                       The authors declare no competing interests.
                        INNER JOIN PRESCRIPTIONS ON
                        DEMOGRAPHIC.HADM_ID=PRESCRIPTIONS.     Author contributions
                        HADM_ID
                        WHERE PRESCRIPTIONS."FORMULARY_        Conceptualization: Youssef Mellah, Veysel Kocaman
                        DRUG_CD" = 'NEOLEVO5';                 Investigation: All authors


            Volume 1 Issue 2 (2024)                        104                               doi: 10.36922/aih.2661
   105   106   107   108   109   110   111   112   113   114   115