Page 23 - JCTR-11-5
P. 23

Journal of Clinical and
            Translational Research                                                AI and LLMs in iPSC cardiac research




            Table 4. (Continued)
            Name        Domain specificity  Primary input modality  Cardiovascular application  Key advantage
            RoseTTAfold   Protein folding tool  Protein sequence and   Structural prediction for variant   Efficient three-track network for
            (Baker Lab)                   distance embedding  evaluation in cell modeling   accurate fold prediction
                                                            platforms
            scFoundation  Single-cell foundation   Single-cell transcriptomics Embeddings used for drug response  Outperforming baseline models in cell
                        model                               prediction and lineage inference in  population mapping
                                                            iPSC-CMs
            scGPT       Single-cell generative   Multi-omic cell atlas   Predictive modeling of cell fate   Scalable multi-scale modeling across
                        transformer       modeling          trajectories and disease phenotypes  millions of single cells
                                                            in cardiomyocyte differentiation
            TensorFlow  Deep learning framework  Neural network model   Used in image, sequence, and   Widely supported, with high
                                          construction      time-series modeling within cardiac  interoperability across platforms
                                                            AI
            Abbreviations: AI: Artificial intelligence; CMs: Cardiomyocytes; CVDs: Cardiovascular diseases; EHRs: Electronic health records; iPSC-CM: Induced
            pluripotent stem cell-derived cardiomyocytes; LLM: Large language model; ML: Machine learning; MYH7: Myosin heavy chain 7; NLP: Natural
            language processing; QA: Quality assurance; TTN: Titin; VUS: Variance of uncertain significance.

            CRFM) have been specifically optimized for biomedical   such as JAX, TensorFlow, and PyTorch, remain essential
            corpora, demonstrating superior performance in gene-  for training custom cardiac models from raw datasets,
            disease association tasks and biomolecular annotation,   offering flexibility in integrating imaging, text, and bio
            particularly in cardiomyopathy and arrhythmia literature   signal data streams.
            mining. 59,63,159  A comparative summary of key models   Despite these advances, there remains no unified
            and their cardiovascular applications is presented in
            Table 4 to illustrate these performance distinctions across   evaluation  framework  to  assess  LLM  performance
            architectural and functional axes.                 across core regenerative tasks, such as differentiation
                                                               protocol optimization, cardiotoxicity modeling, or graft-
              DeepSeekMed, a bilingual biomedical LLM trained on   host interaction simulation. Each model tends to be
            Chinese and English datasets, has outperformed ChatGPT   benchmarked independently, often  using proprietary
            and BioBERT in cross-lingual phenotype extraction   metrics, narrow data types, or single-institution
            and EHR-based cardiovascular risk scoring, making it   validation sets, limiting clinical transferability. The lack of
            especially promising for LMIC and multilingual health   harmonized performance indicators, common cardiac data
            systems. 160-162  Similarly, CardioGenAI, a machine learning   benchmarks, and population-representative validation
            framework developed by BGI Genomics, has demonstrated   pipelines—especially in iPSC-CM contexts—continues
            high predictive accuracy in linking genetic variants with   to hinder reproducibility and deployment scalability in
            phenotype severity in monogenic CVDs, offering a focused   clinical-grade environments.
            advantage in genotype-phenotype interpretation relevant
            to iPSC-CM disease modeling. 163,164                 Therefore, this review serves to highlight not only the
                                                               current strengths and domain-specific niches of various
              In structural biology and regenerative cardiology,
            AlphaFold2, RoseTTAFold, and ESMFold represent next-  LLMs in regenerative cardiology but also the urgent need
            generation protein structure prediction tools that have   for cross-institutional  benchmarking  consortia.  Such
            outpaced  prior  algorithms  in  predicting  conformational   efforts should incorporate:
            changes in sarcomeric proteins, such as MYH7, TTN, and   (i)  Standardized cardiac datasets (e.g., iPSC-CM
            SCN5A,  the key  targets  in  inherited  cardiomyopathies.   electrophysiology, omics, and CRISPR perturbations)
            These models have been successfully integrated into   (ii)  Multimodal integration tasks (e.g., combining text,
            iPSC-CM variant modeling platforms to anticipate      imaging, and gene expression), and
            functional disruptions before in vitro phenotyping. 165-167  (iii) Regionally calibrated validation protocols (especially
                                                                  across LMIC and diverse genetic populations).
              Several frameworks—such as  BioGPT,  BioMedLM,
            and scGPT—are particularly optimized for biomedical   The current review functions not only as a descriptive
            corpora  and  high-dimensional  omics  data,  enabling   summary but also as a  framework proposal—a  scaffold
            improved performance in gene-disease association   on which future empirical benchmarking protocols
            mining, transcriptomic trajectory modeling, and drug   and clinical-grade  LLM applications in cardiovascular
            response prediction. Meanwhile, foundational platforms,   regenerative medicine can be systematically developed.

            Volume 11 Issue 5 (2025)                        17                         doi: 10.36922/JCTR025230026
   18   19   20   21   22   23   24   25   26   27   28