Page 57 - TD-4-1
P. 57
Tumor Discovery Drug repurposing for pancreatic cancer via AI
2. Materials and methods 2.2. Constructing candidate genome-wide genetic
and epigenetic networks for PDAC and healthy
2.1. Overview of PDAC and healthy control controls based on big data mining
genome-wide genetic and epigenetic networks
using systems biology approach In this research, we obtained microarray data from the
NCBI under accession number GSE183795. The dataset
In this study, we aim to establish the GWGENs of PDAC was divided into two groups: the disease group, comprising
and non-PDAC core genomes. Microarray data for PDAC 139 PDAC samples, and the healthy control group,
and non-PDAC were obtained from the National Center consisting of 105 non-PDAC samples.
for Biotechnology Information (NCBI) under accession
number GSE183795. Four processes were then conducted The candidate GWGENs include candidate PPINs and
to identify the core signaling pathways of candidate candidate GRNs. We represented the candidate GWGEN
GWGENs, as illustrated in Figure 1 and detailed below. using a binary Boolean matrix, where a value of 1 is
i. Construction of candidate GWGENs: We utilized a assigned if an interaction or regulation exists for a node,
data mining approach to construct Boolean matrices and 0 if it does not. To construct the candidate PPINs,
representing candidate protein-protein interaction we consulted various databases, including the Database
18
19
networks (PPINs) and candidate gene regulatory of Interacting Proteins (DIP), IntAct, the Biological
20
networks (GRNs), which include interactions among General Repository for Interaction Datasets (BioGRID),
21
proteins, and regulation among genes, microRNAs and the Molecular INTeraction Database (MINT). For the
(miRNAs), and long non-coding RNAs (lncRNAs). candidate GRNs, we utilized multiple resources such as the
Specifically, if an interaction or regulation exists between Human Transcriptional Regulation Interaction Database
22
two nodes, it is denoted as 1; if not, it is denoted as 0. (HTRIdb), the integrated transcription factor platform
23
25
26
24
ii. Identification of real GWGENs: We employed (ITFP), TRANSFAC, CircuitDB, TargetScanHuman,
PDAC and non-PDAC (control) microarray data to and StarBase. 27
construct real GWGENs, identifying parameters for 2.3. Establishing a system model for identifying
protein-protein interaction (PPI) models and GRN real genome-wide genetic and epigenetic networks
regulatory models by solving constrained linear least for PDAC and healthy controls based on candidate
squares parameter estimation problems. To address genome-wide genetic and epigenetic networks
potential false positive interactions in candidate
GWGEN, we pruned these false positives using the To investigate the oncogenic molecular mechanisms of
Akaike Information Criterion (AIC) system order PDAC, we referenced relevant databases and utilized
28
identification method, obtaining real GWGENs for PDAC microarray data to construct candidate GWGENs.
PDAC and non-PDAC. Following the establishment of these candidate GWGENs,
iii. Extraction of core GWGENs: We applied the PNP we employed PDAC microarray data to discern the real
method to extract core GWGENs from the real GWGENs for PDAC and non-PDAC samples. This process
GWGENs. The PNP method calculates the projection required the development of a stochastic system model to
value of each node in the real GWGEN to capture enable candidate GWGENs to capture stochastic interactions
85% of the network’s energy, sorting the projection and regulations, such as protein-protein interactions, as well
values of all nodes from highest to lowest. Given the as the regulation of transcription factors (TFs), miRNAs, and
maximum allowable annotated node count of 6,000 lncRNA. Additionally, the stochastic model should account
(as per KEGG pathways), we selected the top 6,000 for residuals from the initial model establishment and
significant nodes to form the core GWGEN. stochastic noise resulting from experimental measurements.
iv. Construction and comparison of core signaling Furthermore, the main protein interaction model in
pathways: We annotated the KEGG pathways of Equation I and the miRNA regulation models in Equations
PDAC and non-PDAC of core GWGENs based on II–IV were designed as bilinear interaction models based
relevant literature, completing the construction of core on the product of the concentrations of the interacting
signaling pathways for each. We then compared the proteins in Equation I or the regulations of miRNAs on their
upstream microenvironmental factors, core signaling target mRNAs, miRNAs or lncRNAs in Equations II–IV.
pathways, and corresponding downstream aberrant However, for simplification, we presented the interaction
28
cellular functions between PDAC and non-PDAC and regulation coefficients as linear in PPINs and GRNs.
to explore the oncogenic molecular mechanisms of First, we established a system model of the interactions
PDAC and identify potential biomarkers as drug involving the w-th protein and other proteins in the
targets for PDAC therapeutics. candidate PPINs, presented as follows:
Volume 4 Issue 1 (2025) 49 doi: 10.36922/td.4709

