Page 16 - JCTR-11-5
P. 16
Journal of Clinical and
Translational Research AI and LLMs in iPSC cardiac research
machine learning to enable the automated design cardiac cells and disease models. GEO enables
and synthesis of complex molecules. It allows scientists to explore gene regulations in heart
chemists to program and control the synthesis of disease, development, or drug response. It is used
specific compounds, improving efficiency. The for discovering biomarkers, understanding disease
ultimate goal of Chemputer is to accelerate the mechanisms, and validating experimental findings
discovery and development of new compounds for in iPSC-derived cardiomyocytes 69
applications in drug discovery, materials science, (xv) GROK: LLM developed by xAI; referenced in the
and beyond 61 context of AI landscape expansion but not yet
(ix) ClinVar: National Institute of Health (NIH)- applied in iPSC-CM pipelines 19
owned variant, a public database that archives (xvi) HuggingFace Transformers: LLM library hub used
reports of the relationships between human genetic for implementing transformer-based models,
variants and their clinical significance. It collects including BioBERT and REALM; provides the
submissions from research, labs, clinics, and backbone for LLM fine-tuning in multi-modal
researchers, helping to classify whether specific omics data pipelines 43
variants are benign, pathogenic, or of uncertain (xvii) JAX: High-performance ML framework used in
significance 62 cardiac LLMs for omics modeling and optimization
(x) DeepChem: DL framework for in silico drug tasks, including protocol efficiency simulations in
modeling and structure-activity prediction; iPSC-CM studies 41,42
integrated into iPSC-CM cardiotoxicity screening (xviii) PyTorch: DL library used to build and train LLMs for
pipelines using LLM-generated compound cardiac modeling, including time-series prediction
profiling 63,64 and transformer network construction 40,70,71
(xi) DeepSeek-Med: Chinese biomedical LLM (xix) REALM: Retrieval-augmented language model
initiative; mentioned as a potential future combining LLMs with document retrieval—
collaborator in international AI consortia for used in EHR mining and real-time diagnostic
regenerative platforms and clinical translation 19,58 applications, including arrhythmia detection 72
(xii) Ensembl Genome Browser: A genomic data hub (xx) RoseTTAfold (Baker Lab): A DL tool that predicts
platform that provides integrated, annotated protein structures from amino acid sequences
reference genomes for a wide range of species, with high accuracy. It utilizes a three-track neural
including humans. It enables researchers to network to integrate sequence, distance, and
explore genes, variants, regulatory regions, and coordinate data, enabling rapid modeling of protein
comparative genomic data. In cardiac research and structures. In cardiac research, RoseTTAfol aids
clinical translation, Ensembl plays a crucial role in understanding the structural implications of
in identifying genetic mutations and regulatory genetic mutations associated with heart diseases.
elements linked to heart diseases 65,66 By predicting how specific mutations affect protein
(xiii) ESMFold: DL model developed by Meta AI that folding and function, researchers can identify
predicts protein 3D structures directly from potential targets for therapeutic intervention.
amino acid sequences-similar to AlphaFold, but This is particularly valuable when experimental
optimized for speed and scalability. It uses LLM structures are unavailable, allowing for exploration
principles trained on millions of protein sequences of disease mechanisms at the molecular level 56,73-75
to understand protein folding patterns without (xxi) scFoundation: LLM foundation model for single-
relying on multiple sequence alignments. In cardiac cell data integration; supports high-resolution
research and clinical translation, ESMFold can help subtype prediction and cardiac developmental
predict how mutations in cardiac-related proteins, mapping in iPSC-CM pipelines 47
such as ion channels or sarcomeric proteins, alter (xxii) scGPT: Generative LLM tailored for single-cell
their structure and function. This is crucial for omics; used in predicting cell fate trajectories,
understanding diseases, such as cardiomyopathies cardiac subtype classification, and transcriptomic
or channelopathies 69 modeling 46
(xiv) GEO: NIH-owned high-throughput gene (xxiii) TensorFlow: DL library used for implementing
expression and sequencing database repository, deep learning models, including convolutional
such as RNA-seq and microarray results. neural networks and recurrent neural networks,
Researchers submit datasets from various tissues, for cardiac imaging, time-series EHR data, or
cell types, and experimental conditions, including iPSC-CM signal traces. 70,71,76
Volume 11 Issue 5 (2025) 10 doi: 10.36922/JCTR025230026

