Page 88 - MI-2-2
P. 88

Microbes & Immunity                                               Big data and DNN-based DTI model in CHP



            numerous databases and experimental datasets contains   model  of candidate GRN,  describing  the transcriptional
            many false positives, including some seemingly reasonable   regulation of the l-th gene of lung slice cells for sample I, is
            but incorrect information. Therefore, these false positives   given in Equation II :
                                                                               18
            should be removed to obtain the real GWGEN of CHP and
                                                                    X l
            non-CHP using systems biology methods.  Therefore, we   ti[]= ∑ δ  ei Pi[] []+ ∑ ζ  fi Pi[] []−
                                                                                 Y l
                                              17
            needed to trim these false positives from the candidate   l  x=1  lx x  x  y=1  ly  y  l
            GWGEN based on gene/miRNA/lncRNA expression data    Z l
            of CHP and non-CHP.                                ∑ ω lz g [iiP i][ ]+ v Pi[ ]+ ui[ ]
                                                                             l l
                                                                                   l
                                                                     z
                                                                        l
                                                               z=1
              We first constructed a stochastic regression interaction/
            regulation model of human cells to characterize the   for l = 1,2,…,L and i=1,2,…,I            (II)
            interactions and regulations in the candidate GWGEN,   where t[i], e [i], f [i], and g [i] denote the expression
                                                                                y
                                                                        l
                                                                            x
                                                                                        z
            including PPI, gene regulation, miRNA regulation, lncRNA   level of the l-th target gene, the x-th TF, the y-th lncRNA
            regulation, and epigenetic regulation. To identify the real   and the z-th miRNA for the i-th sample, respectively. δ
                                                                                                            lx
            GWGEN for each lung tissue cell condition, we applied   and ζ  are the transcription regulatory ability of the x-th
                                                                   ly
            system identification and system order detection methods   TF and the y-th lncRNA on their corresponding binding
            to the interaction/regulation models of the candidate   target gene  l.  ω  indicates the post-transcriptional
                                                                              lz
            GWGEN using gene/miRNA/lncRNA expression data      regulatory ability of the  z-th miRNA to inhibit the  l-th
            and epigenetic profiles for each lung tissue cell condition.   target gene (−ω  ≤ 0). X, Y, and Z represent the number of
                                                                                          l
                                                                           lz
                                                                                    l
                                                                                  l
            Significant interactions and regulations beyond the system   TFs, lncRNAs and miRNAs binding to the l-th target gene.
            order are considered false positives in the candidate GEN,   L and I denote the number of genes with candidate GRN
            which were trimmed to obtain the real GWGEN for CHP   and the number of data samples. ν represents the basal level
                                                                                         l
            fibrosis cells and healthy lung tissue cells. The stochastic   of the target gene l. u[i] is the stochastic noise of the l-th
                                                                                l
            regression protein interaction model for candidate PPIN   target gene for the sample i due to model uncertainty and
            in the candidate GWGEN is represented in Equation I for   data noise. P[i] denotes the methylation regulation of the
                                                                         l
            the protein interaction of the k-th protein in lung cells of   l-th gene through its effect on the binding affinities of TFs,
            the sample i:                                      miRNAs, lncRNAs, and RNA polymerase on the target
                                                               gene. The terms  δ e [i]P [i],  ζ f [i]P[i],  ω g [i]P[i], and
                                                                              lx x
                                                                                             l
                                                                                                       l
                                                                                        ly y
                                                                                                  lz z
                                                                                   x
                  W k                                          νP[i] denote the effect of methylation, phosphorylation, or
                       Si Si[] []+ β
            Si[]= ∑ α kw k  w   k  +γ k  i []                   l l
             k
                 w=1                                           ubiquitination on the binding sympathy of TFs, miRNAs,
                 wk                                            lncRNAs, and RNA polymerase to the  l-th target gene,
                  ≠
                                                               respectively.  Furthermore,  to  evaluate  the  methylation
            for k = 1,2,…,K and i=1,2,…,I               (I)    regulation direction of the l-th target gene t[i] using DNA
                                                                                                  l
                                                               methylation profile  P[i] can be defined as follows  in
                                                                                                          19
              where  S [i] and  S [i] denote the expression level of   Equation III.  l
                             w
                     k
            the k-th protein and the w-th protein for the i-th sample,
            respectively. α  is the interaction ability between the k-th   1
                       kw
            protein and the w-th protein, which is an interactive protein   Pi   =  Pi []               (III)
                                                                l  
            of the k-th protein. W  represents the number of proteins   1 +(  l  ) 2
                              k
            interacting with the k-th protein and K is the number of     05.
            proteins with candidate PPIN. i denotes the number of data   where  P[i] indicates the DNA methylation profile of
                                                                        l
            samples (pneumonitis lung cell and non-pneumonitis lung   the l-th gene for the sample i. In the equation above, the
            cell).β  represents the basal level of protein k due to some   range of effects of DNA methylation on the l-th target gene
                 k
            unknown effects, such as phosphorylation, methylation,   P[i] is 1 to 0.2, while the DNA methylation profile, p[i],
                                                                                                           l
                                                                l
            and ubiquitination. γ [i] is the stochastic noise of the k-th   ranges  from  0  to  1.  From  the  biological  system  aspect,
                             k
            protein for the sample  i due to model uncertainty and   the equation above suggests that the higher the DNA
            measurement noise. The  protein interaction  model  in   methylation level, the weaker the binding between TFs,
            Equation I  can  be  interpreted  as  follows: the  expression   miRNAs, lncRNAs, and RNA polymerases and their target
            level of the k-th protein is related to the interactions with   genes. In contrast, the lower the DNA methylation level, the
            other proteins, denoted as W , in the candidate PPIN.  stronger the binding between TFs, miRNAs, lncRNAs, and
                                   k
              Subsequently, we created a regulatory system model that   RNA polymerases and their target genes. The methylation
            includes interactions between genes and their regulators,   regulation  P[i] in Equation III has a regulation value
                                                                         l
            such as TFs, miRNAs, and lncRNAs. The gene regulatory   (P[i] = 0.2), which corresponds to the DNA methylation
                                                                 l
            Volume 2 Issue 2 (2025)                         80                               doi: 10.36922/mi.4620
   83   84   85   86   87   88   89   90   91   92   93