Page 88 - MI-2-2
P. 88
Microbes & Immunity Big data and DNN-based DTI model in CHP
numerous databases and experimental datasets contains model of candidate GRN, describing the transcriptional
many false positives, including some seemingly reasonable regulation of the l-th gene of lung slice cells for sample I, is
but incorrect information. Therefore, these false positives given in Equation II :
18
should be removed to obtain the real GWGEN of CHP and
X l
non-CHP using systems biology methods. Therefore, we ti[]= ∑ δ ei Pi[] []+ ∑ ζ fi Pi[] []−
Y l
17
needed to trim these false positives from the candidate l x=1 lx x x y=1 ly y l
GWGEN based on gene/miRNA/lncRNA expression data Z l
of CHP and non-CHP. ∑ ω lz g [iiP i][ ]+ v Pi[ ]+ ui[ ]
l l
l
z
l
z=1
We first constructed a stochastic regression interaction/
regulation model of human cells to characterize the for l = 1,2,…,L and i=1,2,…,I (II)
interactions and regulations in the candidate GWGEN, where t[i], e [i], f [i], and g [i] denote the expression
y
l
x
z
including PPI, gene regulation, miRNA regulation, lncRNA level of the l-th target gene, the x-th TF, the y-th lncRNA
regulation, and epigenetic regulation. To identify the real and the z-th miRNA for the i-th sample, respectively. δ
lx
GWGEN for each lung tissue cell condition, we applied and ζ are the transcription regulatory ability of the x-th
ly
system identification and system order detection methods TF and the y-th lncRNA on their corresponding binding
to the interaction/regulation models of the candidate target gene l. ω indicates the post-transcriptional
lz
GWGEN using gene/miRNA/lncRNA expression data regulatory ability of the z-th miRNA to inhibit the l-th
and epigenetic profiles for each lung tissue cell condition. target gene (−ω ≤ 0). X, Y, and Z represent the number of
l
lz
l
l
Significant interactions and regulations beyond the system TFs, lncRNAs and miRNAs binding to the l-th target gene.
order are considered false positives in the candidate GEN, L and I denote the number of genes with candidate GRN
which were trimmed to obtain the real GWGEN for CHP and the number of data samples. ν represents the basal level
l
fibrosis cells and healthy lung tissue cells. The stochastic of the target gene l. u[i] is the stochastic noise of the l-th
l
regression protein interaction model for candidate PPIN target gene for the sample i due to model uncertainty and
in the candidate GWGEN is represented in Equation I for data noise. P[i] denotes the methylation regulation of the
l
the protein interaction of the k-th protein in lung cells of l-th gene through its effect on the binding affinities of TFs,
the sample i: miRNAs, lncRNAs, and RNA polymerase on the target
gene. The terms δ e [i]P [i], ζ f [i]P[i], ω g [i]P[i], and
lx x
l
l
ly y
lz z
x
W k νP[i] denote the effect of methylation, phosphorylation, or
Si Si[] []+ β
Si[]= ∑ α kw k w k +γ k i [] l l
k
w=1 ubiquitination on the binding sympathy of TFs, miRNAs,
wk lncRNAs, and RNA polymerase to the l-th target gene,
≠
respectively. Furthermore, to evaluate the methylation
for k = 1,2,…,K and i=1,2,…,I (I) regulation direction of the l-th target gene t[i] using DNA
l
methylation profile P[i] can be defined as follows in
19
where S [i] and S [i] denote the expression level of Equation III. l
w
k
the k-th protein and the w-th protein for the i-th sample,
respectively. α is the interaction ability between the k-th 1
kw
protein and the w-th protein, which is an interactive protein Pi = Pi [] (III)
l
of the k-th protein. W represents the number of proteins 1 +( l ) 2
k
interacting with the k-th protein and K is the number of 05.
proteins with candidate PPIN. i denotes the number of data where P[i] indicates the DNA methylation profile of
l
samples (pneumonitis lung cell and non-pneumonitis lung the l-th gene for the sample i. In the equation above, the
cell).β represents the basal level of protein k due to some range of effects of DNA methylation on the l-th target gene
k
unknown effects, such as phosphorylation, methylation, P[i] is 1 to 0.2, while the DNA methylation profile, p[i],
l
l
and ubiquitination. γ [i] is the stochastic noise of the k-th ranges from 0 to 1. From the biological system aspect,
k
protein for the sample i due to model uncertainty and the equation above suggests that the higher the DNA
measurement noise. The protein interaction model in methylation level, the weaker the binding between TFs,
Equation I can be interpreted as follows: the expression miRNAs, lncRNAs, and RNA polymerases and their target
level of the k-th protein is related to the interactions with genes. In contrast, the lower the DNA methylation level, the
other proteins, denoted as W , in the candidate PPIN. stronger the binding between TFs, miRNAs, lncRNAs, and
k
Subsequently, we created a regulatory system model that RNA polymerases and their target genes. The methylation
includes interactions between genes and their regulators, regulation P[i] in Equation III has a regulation value
l
such as TFs, miRNAs, and lncRNAs. The gene regulatory (P[i] = 0.2), which corresponds to the DNA methylation
l
Volume 2 Issue 2 (2025) 80 doi: 10.36922/mi.4620

