Page 20 - JCTR-11-5
P. 20
Journal of Clinical and
Translational Research AI and LLMs in iPSC cardiac research
Table 3. Large language model functions across multi‑omics integration, CRISPR insight, and diagnostic support
LLM function Input data type Models Output/application
Gene editing target CRISPR perturbation and BioBERT, scGPT AI identified TBX5, MEF2C, and NKX2-5 as core cardiac
prioritization scRNA-seq regulators, impacting the fate of iPSC-CM
Enhancer-promoter Sequence, and epigenomic Roformer, GAT Predicted bifurcation nodes in Wnt/Notch pathways
interaction mapping (Graph Attention)
Transcriptional co-factor Biomedical abstracts and BioMedLM Revealed the influence of GATA4, HAND2, and SIRT1 on
discovery protocol subtype transitions
Lineage trajectory Chromatin maps, scRNA-seq, Deep generative Modeled mesoderm-to-cardiomyocyte stages and stratified
reconstruction and ECG models arrhythmia risk
Triage and diagnosis ECT, CT, and telemetry BiomedLM, LLaMA, Generated arrhythmia and cardiomyopathy risk profiles,
(biomarker inference) and scGPT predicted early fibrosis signal in cardiomyopathy
Variant interpretation Multi-omics and phenotype CardioGenAI Linked gene variants to severity in inherited cardiac diseases
Abbreviations: AI: Artificial intelligence; CRISPR: Clustered regularly interspaced short palindromic repeats; CT: Computed tomography;
ECG: Electrocardiogram; ECT: Electroconvulsive therapy; iPSC-CM: Induced pluripotent stem cell-derived cardiomyocytes; LLMs: Large language
models; scRNA: Single-cell RNA.
integrated applications across mechanisms, models, and for predicting long-term outcomes, such as graft-host
outputs. integration, ventricular remodeling, or sudden cardiac
death. Without multi-center validation pipelines and
152
3.6. Translational gaps and ethical risks regionally calibrated metrics, LLMs risk producing brittle
While the integration of LLMs into cardiovascular or misleading outputs under real-world biological and
regenerative frameworks shows great promise, several clinical complexity.
systemic and technical limitations remain underexamined. Ethical challenges compound these technical issues.
These include generalizability across underrepresented LLMs trained on patient data raise privacy risks and
populations, reproducibility of predictions in noisy or call for enhanced frameworks for informed consent—
unstandardized datasets, and the interpretability of high- particularly in iPSC-CM contexts where patient-derived
stakes clinical outputs, such as transplant decisions or cells are used for training predictive models. 156-158 In
differentiation outcomes. regenerative therapy, where interventions may be life-
A key concern is the validity of the cross-population altering or irreversible, opacity of model logic is especially
model. Most LLMs in current use have been trained on data concerning. Clinicians must be able to interpret the reasons
derived from high-income countries (HICs)—particularly that a model recommends or predicts a given outcome;
the United States—European EHRs, biomedical literature, otherwise, reliance on black-box predictions in high-stakes
and clinical guidelines. As a result, model outputs may fail decisions (e.g., transplant eligibility and cell graft rejection
to generalize across populations with different genomic likelihood) could undermine patient safety and trust.
architectures, environmental stressors, and healthcare Finally, algorithmic bias remains a pressing
access patterns. For instance, LLMs trained exclusively on concern. Models trained on skewed data distributions
Western cardiac data have shown diminished sensitivity in can unintentionally reinforce disparities in access to
detecting ischemic heart disease in Southeast Asian and regenerative interventions, gender bias in diagnosis (e.g.,
rural African populations. 147-149 This bias not only impairs underdiagnosis of women with microvascular disease),
diagnostic accuracy but can also perpetuate disparities in or triaging influenced by insurance status. These risks are
regenerative therapy candidacy and outcome prediction. magnified in low- and middle-income countries (LMIC)
Beyond data imbalance, biological noise and settings, where infrastructural gaps may be masked by
institutional heterogeneity also challenge reproducibility. generalized LLM outputs that do not account for resource
iPSC-CM modeling involves variation across laboratory constraints.
protocols, epigenetic memory effects, and differentiation Moving forward, responsible deployment of LLMs in
batch variability. 150,151 These inconsistencies introduce cardiovascular regenerative medicine demands global
latent confounders that can mislead LLM outputs, data equity, transparent architecture, and regulatory
especially when working with small or institution-specific harmonization. Cross-continental consortia should be
datasets. Furthermore, longitudinal datasets from low- established to develop standardized, open-access cardiac
resource regions remain scarce, limiting model calibration datasets that incorporate genomic, imaging, and clinical
Volume 11 Issue 5 (2025) 14 doi: 10.36922/JCTR025230026

