Page 87 - MI-2-2

P. 87

Microbes & Immunity Big data and DNN-based DTI model in CHP

models. Three design specifications were used to select based on three design specifications for the treatment of
adequate molecular drugs as potential multi-molecular CHP.
drugs for CHP, with an emphasis on inhibiting and
regulating disease symptoms. Lung fibrosis cells from CHP 2.2. Data preprocessing of CHP and healthy controls
patients and normal lung tissue cells from healthy control microarray data
groups, specifically the A549 cell line, were sampled to In this study, microarray data (GSE86618) from the
construct a GWGEN for the identification of core signaling National Center for Biotechnology Information (NCBI)
pathways involved in the pathogenesis of CHP. Candidate were analyzed. The dataset, divided into the fibrosis
GWGEN networks, including candidate PPINs and GRNs lung cell and the healthy lung tissue cell control groups,
encompassing regulatory networks of fibrosis genes and consists of 325 CHP samples and 215 non-CHP samples.
genes from healthy controls, were constructed using The significant variability in gene expressions and PPIs
big data mining and preprocessing of gene/microRNA under different biological conditions highlights the risk
(miRNA)/long non-coding RNA (lncRNA) expression of overlooking individual differences if only significant
data from fibrosis cells in online databases. genes or proteins are considered. To uncover the cellular
mechanisms of CHP, the corresponding core signaling
By leveraging big data mining and systems modeling
techniques, we aimed to investigate and compare the pathways must be identified from GWGEN. The candidate
GWGEN, constructed using experimental data and
molecular mechanisms of CHP and non-CHP. This computational predictions from multiple databases,
involved constructing a GWGEN, core GWGEN, and core includes both the PPIN and GRN. A binary Boolean
signaling pathways for both CHP and non-CHP. The steps matrix was used to represent the candidate GWGEN, with
involved are depicted in Figure 1. First, big data mining was “1” indicating interaction or regulation between nodes and
used from the corresponding database and preprocessing “0” indicating no interaction or regulation.
of gene, miRNA, and lncRNA expression data. Second,
candidate PPI networks and GRNs were employed to To establish the candidate PPIN, we referenced various
construct GWGEN candidates. Next, whole-genome databases, including the Transcription Factor Database
7
microarray data from different samples of CHP and non- (TRANSFAC), the Biological General Repository
8
CHP (fibrosis cells and healthy control cells) were used to for Interaction Datasets (BioGRID), the Database of
9
identify the pathogenetic mechanism of CHP and compare Interacting Proteins (DIP), the Molecular INTeraction
10
11
them with the healthy control group. Subsequently, the real database (MINT), and IntAct. Similarly, we utilized
12
GWGENs of CHP and non-CHP were constructed, and TargetScanHuman and the Human Transcriptional
13
the PNP method was employed to extract core proteins, Regulation Interaction Database (HTRIdb) to identify
genes, miRNA, and lncRNA from the real GWGENs to candidate regulatory interactions between human
construct the core GWGEN of CHP and non-CHP. Finally, transcription factors and their targets. These interactions
core signaling pathways were constructed by annotating were based on integrated transcription factor platform
15
14
16
the KEGG pathways of CHP and health control from their databases (ITFP), CircuitDB, and StarBase v2.0.
core GWGENs. Finally, the core signaling pathways of By integrating candidate PPI, candidate regulatory
CHP and healthy control and their downstream cellular interactions between transcription factors (TFs)/miRNA/
dysfunction were compared to investigate the pathogenetic lncRNA and their target genes/miRNA/lncRNA, whole-
mechanisms. genome microarray gene/miRNA/lncRNA expression data,
and DNA methylation profiles, we obtained the candidate
By comparing the core GWGEN of fibrosis cells from GWGEN using MATLAB text files. String manipulation
CHP patients with normal lung cells from the healthy tools in large text mining were used to standardize these
control group at different stages of the cell repair cycle gene names according to the standard gene name rules
(damage, repair, and fibrosis), we extracted the differential in the NCBI Gene Database, achieving an automatic
core signal pathways between CHP patient fibrosis cells extraction method for key biomarkers.
and normal lung cells. This helped to unravel the genetic
and epigenetic mechanisms underlying fibrosis in CHP 2.3. Stochastic regression models construction of
patients. In addition, based on our findings, we selected candidate PPIN and gene regulation network of
significant biomarkers of the pathogenetic mechanisms candidate genome-wide and EINs
as drug targets for the treatment of CHP. Finally, utilizing In the previous section, based on text mining and
DTI databases, we trained a DNN-based DTI model to comprehensive big data mining from GSE86618
predict candidate molecular drugs, from which potential microarray gene databases, we constructed a candidate
molecular drugs were selected as multi-molecular drugs GWGEN. However, the candidate GWGEN built from

Volume 2 Issue 2 (2025) 79 doi: 10.36922/mi.4620

82 83 84 85 86 87 88 89 90 91 92