Page 20 - JCTR-11-5
P. 20

Journal of Clinical and
            Translational Research                                                AI and LLMs in iPSC cardiac research




            Table 3. Large language model functions across multi‑omics integration, CRISPR insight, and diagnostic support
            LLM function       Input data type        Models          Output/application
            Gene editing target   CRISPR perturbation and   BioBERT, scGPT  AI identified TBX5, MEF2C, and NKX2-5 as core cardiac
            prioritization     scRNA-seq                              regulators, impacting the fate of iPSC-CM
            Enhancer-promoter   Sequence, and epigenomic  Roformer, GAT   Predicted bifurcation nodes in Wnt/Notch pathways
            interaction mapping                       (Graph Attention)
            Transcriptional co-factor   Biomedical abstracts and   BioMedLM  Revealed the influence of GATA4, HAND2, and SIRT1 on
            discovery          protocol                               subtype transitions
            Lineage trajectory   Chromatin maps, scRNA-seq,   Deep generative   Modeled mesoderm-to-cardiomyocyte stages and stratified
            reconstruction     and ECG                models          arrhythmia risk
            Triage and diagnosis   ECT, CT, and telemetry  BiomedLM, LLaMA,   Generated arrhythmia and cardiomyopathy risk profiles,
            (biomarker inference)                     and scGPT       predicted early fibrosis signal in cardiomyopathy
            Variant interpretation  Multi-omics and phenotype  CardioGenAI  Linked gene variants to severity in inherited cardiac diseases
            Abbreviations: AI: Artificial intelligence; CRISPR: Clustered regularly interspaced short palindromic repeats; CT: Computed tomography;
            ECG: Electrocardiogram; ECT: Electroconvulsive therapy; iPSC-CM: Induced pluripotent stem cell-derived cardiomyocytes; LLMs: Large language
            models; scRNA: Single-cell RNA.

            integrated applications across mechanisms, models, and   for predicting long-term outcomes, such as graft-host
            outputs.                                           integration, ventricular remodeling, or sudden cardiac
                                                               death.  Without  multi-center  validation  pipelines  and
                                                                    152
            3.6. Translational gaps and ethical risks          regionally calibrated metrics, LLMs risk producing brittle
            While  the  integration of  LLMs  into cardiovascular   or  misleading  outputs  under real-world  biological and
            regenerative frameworks shows great promise, several   clinical complexity.
            systemic and technical limitations remain underexamined.   Ethical challenges compound these technical issues.
            These include generalizability across underrepresented   LLMs trained on patient data raise privacy risks and
            populations, reproducibility of predictions in noisy or   call for enhanced frameworks for informed consent—
            unstandardized datasets, and the interpretability of high-  particularly in iPSC-CM contexts where patient-derived
            stakes  clinical  outputs,  such  as  transplant  decisions  or   cells are used for training predictive models. 156-158  In
            differentiation outcomes.                          regenerative therapy, where interventions may be life-
              A key concern is the validity of the cross-population   altering or irreversible, opacity of model logic is especially
            model. Most LLMs in current use have been trained on data   concerning. Clinicians must be able to interpret the reasons
            derived from high-income countries (HICs)—particularly   that a model recommends or predicts a given outcome;
            the United States—European EHRs, biomedical literature,   otherwise, reliance on black-box predictions in high-stakes
            and clinical guidelines. As a result, model outputs may fail   decisions (e.g., transplant eligibility and cell graft rejection
            to  generalize  across  populations with  different  genomic   likelihood) could undermine patient safety and trust.
            architectures, environmental stressors, and healthcare   Finally, algorithmic bias remains a pressing
            access patterns. For instance, LLMs trained exclusively on   concern. Models trained on skewed data distributions
            Western cardiac data have shown diminished sensitivity in   can unintentionally reinforce disparities in access to
            detecting ischemic heart disease in Southeast Asian and   regenerative interventions, gender bias in diagnosis (e.g.,
            rural African populations. 147-149  This bias not only impairs   underdiagnosis of women with microvascular disease),
            diagnostic accuracy but can also perpetuate disparities in   or triaging influenced by insurance status. These risks are
            regenerative therapy candidacy and outcome prediction.  magnified in low- and middle-income countries (LMIC)
              Beyond data imbalance, biological noise and      settings,  where  infrastructural  gaps  may  be  masked  by
            institutional heterogeneity also challenge reproducibility.   generalized LLM outputs that do not account for resource
            iPSC-CM  modeling  involves  variation  across  laboratory   constraints.
            protocols, epigenetic memory effects, and differentiation   Moving forward, responsible deployment of LLMs in
            batch variability. 150,151  These inconsistencies introduce   cardiovascular regenerative medicine demands global
            latent confounders that can mislead LLM outputs,   data  equity,  transparent  architecture,  and  regulatory
            especially when working with small or institution-specific   harmonization. Cross-continental consortia should be
            datasets. Furthermore, longitudinal datasets from low-  established to develop standardized, open-access cardiac
            resource regions remain scarce, limiting model calibration   datasets that incorporate genomic, imaging, and clinical


            Volume 11 Issue 5 (2025)                        14                         doi: 10.36922/JCTR025230026
   15   16   17   18   19   20   21   22   23   24   25