Page 109 - AIH-1-2
P. 109

Artificial Intelligence in Health                                 Schema-less text2sql conversion with LLMs



            4.3. Results and discussion                        (7B parameters), and Defog-SQLCoder (15B parameters)

            The evaluation of various text-to-SQL models on the   show commendable proficiency, our approach using the
            MIMICSQL test set has provided significant insights. The   schema-less text-to-SQL with Flan-T5 Large, which has
            baseline TREQS model recorded an LFA of 0.48, which   only 780M parameters, notably outperforms others. This
            marginally increased to 0.55 with the incorporation of a   demonstrates not only superior performance but also
            recovery technique (TREQS + Recover). The current state-  remarkable efficiency, offering transformative potential
            of-the-art model, Defog-SQLCoder, achieved an LFA of   in both specific domains and broader applications. The
            0.65. In comparison, the LLMs GPT 3.5-Turbo and GPT-4   detailed results are tabulated in Table 5.
            demonstrated robust performance with LFA scores of 0.60   The results from our comprehensive evaluation shed
            and 0.70, respectively, highlighting their applicability. In   light on the text-to-SQL domain, underscoring the
            addition, the LLaMA-2-7B model, which was fine-tuned   significance of language model-based models (LLMs)
            for text-to-SQL tasks, attained an LFA of 0.60. Remarkably,   and the promising potential of  schema-less approaches
            our custom fine-tuned model, Flan-T5 Large, surpassed all   in healthcare. It is crucial to note that the LLMs under
            these models with an LFA of 0.85.                  scrutiny, specifically LLAMA-2-7B and DeFog-SQLCoder,
                                                               were fine-tuned on the text-to-SQL task, encompassing
              Figure 4 presents a clear illustration of a sample natural
            language query, the ground truth SQL query that would   datasets such as MIMICSQL, thereby directly incorporating
                                                               knowledge pertinent to this domain. On the other hand, the
            accurately respond to this query, and the SQL queries   GPT models (GPT-3.5-Turbo and GPT-4) are renowned for
            generated by the LLMs used in our experiments, namely,   their versatility in evaluating various NLP tasks, including
            LLaMA  2-7B,  GPT-3.5-Turbo,  GPT-4,  and  DeFog-  text-to-SQL, due to their extensive pre-training on diverse
            SQLCoder, along with our Flan-T5 models. This comparison   corpora. While these models were not specifically fine-
            vividly  highlights  the  differences  in  the  query  generation   tuned on the MIMICSQL dataset, their broad exposure
            capabilities of each model, offering a tangible demonstration   during pre-training to a wide array of textual and structured
            of their respective performances in the text-to-SQL context.  data may have contributed to their performance on the
              This outcome indicates that while existing models   MIMICSQL test set. This factor is important to consider
            such as GPT-3.5-Turbo (20B parameters), LLaMA-2-7B   when interpreting the comparative performance of these
































            Figure 4. Sample SQL query generation. This figure illustrates a sample natural language query alongside the corresponding ground truth SQL query and
            the SQL queries generated by the evaluated LLMs (LLaMA-2-7B, GPT-3.5-Turbo, GPT-4, and DeFog-SQLCoder) and our Flan-T5 models. In addition,
            an augmented version of the ground truth query is presented, serving as an example of how we enriched the training data during the fine-tuning of our
            FlanT5 models. It is important to note that this augmentation was exclusively for training purposes; no data in the test set were altered or augmented in
            any manner.


            Volume 1 Issue 2 (2024)                        103                               doi: 10.36922/aih.2661
   104   105   106   107   108   109   110   111   112   113   114