Page 23 - JCTR-11-5
P. 23
Journal of Clinical and
Translational Research AI and LLMs in iPSC cardiac research
Table 4. (Continued)
Name Domain specificity Primary input modality Cardiovascular application Key advantage
RoseTTAfold Protein folding tool Protein sequence and Structural prediction for variant Efficient three-track network for
(Baker Lab) distance embedding evaluation in cell modeling accurate fold prediction
platforms
scFoundation Single-cell foundation Single-cell transcriptomics Embeddings used for drug response Outperforming baseline models in cell
model prediction and lineage inference in population mapping
iPSC-CMs
scGPT Single-cell generative Multi-omic cell atlas Predictive modeling of cell fate Scalable multi-scale modeling across
transformer modeling trajectories and disease phenotypes millions of single cells
in cardiomyocyte differentiation
TensorFlow Deep learning framework Neural network model Used in image, sequence, and Widely supported, with high
construction time-series modeling within cardiac interoperability across platforms
AI
Abbreviations: AI: Artificial intelligence; CMs: Cardiomyocytes; CVDs: Cardiovascular diseases; EHRs: Electronic health records; iPSC-CM: Induced
pluripotent stem cell-derived cardiomyocytes; LLM: Large language model; ML: Machine learning; MYH7: Myosin heavy chain 7; NLP: Natural
language processing; QA: Quality assurance; TTN: Titin; VUS: Variance of uncertain significance.
CRFM) have been specifically optimized for biomedical such as JAX, TensorFlow, and PyTorch, remain essential
corpora, demonstrating superior performance in gene- for training custom cardiac models from raw datasets,
disease association tasks and biomolecular annotation, offering flexibility in integrating imaging, text, and bio
particularly in cardiomyopathy and arrhythmia literature signal data streams.
mining. 59,63,159 A comparative summary of key models Despite these advances, there remains no unified
and their cardiovascular applications is presented in
Table 4 to illustrate these performance distinctions across evaluation framework to assess LLM performance
architectural and functional axes. across core regenerative tasks, such as differentiation
protocol optimization, cardiotoxicity modeling, or graft-
DeepSeekMed, a bilingual biomedical LLM trained on host interaction simulation. Each model tends to be
Chinese and English datasets, has outperformed ChatGPT benchmarked independently, often using proprietary
and BioBERT in cross-lingual phenotype extraction metrics, narrow data types, or single-institution
and EHR-based cardiovascular risk scoring, making it validation sets, limiting clinical transferability. The lack of
especially promising for LMIC and multilingual health harmonized performance indicators, common cardiac data
systems. 160-162 Similarly, CardioGenAI, a machine learning benchmarks, and population-representative validation
framework developed by BGI Genomics, has demonstrated pipelines—especially in iPSC-CM contexts—continues
high predictive accuracy in linking genetic variants with to hinder reproducibility and deployment scalability in
phenotype severity in monogenic CVDs, offering a focused clinical-grade environments.
advantage in genotype-phenotype interpretation relevant
to iPSC-CM disease modeling. 163,164 Therefore, this review serves to highlight not only the
current strengths and domain-specific niches of various
In structural biology and regenerative cardiology,
AlphaFold2, RoseTTAFold, and ESMFold represent next- LLMs in regenerative cardiology but also the urgent need
generation protein structure prediction tools that have for cross-institutional benchmarking consortia. Such
outpaced prior algorithms in predicting conformational efforts should incorporate:
changes in sarcomeric proteins, such as MYH7, TTN, and (i) Standardized cardiac datasets (e.g., iPSC-CM
SCN5A, the key targets in inherited cardiomyopathies. electrophysiology, omics, and CRISPR perturbations)
These models have been successfully integrated into (ii) Multimodal integration tasks (e.g., combining text,
iPSC-CM variant modeling platforms to anticipate imaging, and gene expression), and
functional disruptions before in vitro phenotyping. 165-167 (iii) Regionally calibrated validation protocols (especially
across LMIC and diverse genetic populations).
Several frameworks—such as BioGPT, BioMedLM,
and scGPT—are particularly optimized for biomedical The current review functions not only as a descriptive
corpora and high-dimensional omics data, enabling summary but also as a framework proposal—a scaffold
improved performance in gene-disease association on which future empirical benchmarking protocols
mining, transcriptomic trajectory modeling, and drug and clinical-grade LLM applications in cardiovascular
response prediction. Meanwhile, foundational platforms, regenerative medicine can be systematically developed.
Volume 11 Issue 5 (2025) 17 doi: 10.36922/JCTR025230026

