Page 92 - MI-2-2
P. 92

Microbes & Immunity                                               Big data and DNN-based DTI model in CHP



                               2
                    2
                2
                       2
               Θ , Θ ,  Θ , and  Θ  are the estimated residual errors   and regulations that fall outside the true system order in
                               n
                    l
                       m
                k
            obtained from the least square identification method in   Equations XXX-XXXIII, processing one protein, one gene,
            Equations XVIII-XXI of the k-th PPIN, the l-th gene, the   one miRNA, and one lncRNA at a time. The real GWGEN
            m-th miRNA, and the n-th lncRNA, respectively. Θ , Θ,   for CHP and non-CHP are pruned from the candidate
                                                         l
                                                      k
            Θ , and Θ , represent the residual parameters estimation   GWGEN using the proposed AIC method, as shown in
                    n
             m
            of protein, gene, miRNA, and lncRNA, respectively, as   Figure 2A and B.
            follows in Equations XXVI-XXIX.
                                                               2.5. Core genome-wide and EINs extraction using
                     
                                
                            T
                                *
                  (σβ   − S )(σβ   − S                         the PNP method
                     *
              2
            Θ =  2  k  k  k  I  k  k  k)            (XXVI)     The real GWGENs remain large and complex, making
              k
                                                               it  challenging  to extract  significant information and
                     
                               
                           T
                  (*σβ  − T )( *σβ − T                         gain insights into the pathogenetic mechanisms of CHP
              2
            Θ =  2  l  l  l  I  l  l  l)           (XXVII)     progression. This complexity hinders the comparisons and
              l
                                                               investigations between lung slice cells from CHP patients
                                  
                             T
                  (σ    *β  − D )(σ  *β  − D                  and healthy controls. Recently, KEGG pathway annotation
            Θ =  2  m  m    m   m   m   m)         (XXVIII)    has become a powerful tool for analyzing signaling
              2
                              I
              m
                                                                       19
                                                               pathways.  However, KEGG pathways currently support
                                
                     
                                *
                  (σβ   − Q )(σβ   − Q                         annotations for only 6,000 molecules. Therefore, the real
                            T
                     *
            Θ =  2  n  n   n  I  n  n  n)           (XXIX)     GWGENs will be truncated to core GWGENs with 6,000
              2
              n
                                                               molecules for KEGG pathway annotation. To achieve this,
                                                               we applied the PNP method to extract 6000 significant
              Based on the system order detection theory of system
                                                     *
            identification, 18,19  the true number of parameters  W , ( X * l  proteins, genes, miRNA, and lncRNA as the core GWGENs
                                                               from the real GWGENs in CHP and non-CHP. The system
                                                     k
               *
                  *
                                               *
                             *
                                                   *
                                           *
                                *
                        *
            + Y + Z ),  ( X + Y + Z ),  and ( X + Y + Z )  could   models  describing the  PPIs, gene regulations, miRNA
                                               n
                                                   n
                                m
                        m
                  l
              l
                            m
                                           n
            minimize AIC (W ), AIC (X, Y, Z), AIC (X , Y , Z ), and
                                     l
                                       l
                                                     m
                                               m
                          k
                                                  m
                                  l
            AIC (X , Y , Z ) in Equations XXII-XXV, respectively, to
                  n
                     n
                        n
            obtain the true system order of real GWGEN. In other   A
                                                 *
                                             *
                                    *
                                                    *
            words, the minimum AIC ( W ), AIC ( X + Y + Z ), AIC
                                    k
                                                l
                                                    l
                                             l
                                       *
                                   *
                                          *
                      *
                   *
              *
            ( X + Y + Z ), and AIC ( X + Y + Z ) could be achieved
                  m
                                   n
                      m
                                          n
              m
                                      n
                                                         *
                                                    *
            by true number of parameters (system order)  W , ( X ,
                                                    k
                                                         l
                 *
                                     *
                                         *
              *
                            *
                         *
                                            *
                      *
            Y , Z ), ( X , Y , Z ), and ( X , Y , Z ), respectively, by
                                        n
             l
                                           n
                l
                         m
                      m
                            m
                                     n
            the tradeoff between residual error in the first term of
            Equations XXVI-XXIVV and parameter association
            number in the second term of Equations XXII-XXV in the   B
            following expression (Equations XXX-XXXIII).
            W =  arg min AIC W(  k )                 (XXX)
              *
              k
                    W k
              *
            XY Z,  l * ,  * l  = arg min  AIC XY Z(  l ,,  l )  (XXXI)
                                     l
              l
                           l ,
                         l ,
                         XY Z l
              *
            XY Z,  * ,  *  = arg min  AIC XY Z(  ,  ,  )  (XXXII)
                    m
                 m
              m
                          X m , Y Z m  m  m  m
                            m ,
              *
            XY Z,  n * ,  * n  = arg min  AIC XY Z(  n ,  n ,  n )  (XXXIII)  Figure 2. (A) The real GWGEN of CHP and (B) the real GWGEN of
              n
                           n ,
                          n ,
                         XY Z n                                non-CHP are identified using system identification and order detection
                                                               methods via the corresponding microarray data. The numbers indicate
              Therefore, we can prune the false positives of protein   the node numbers of proteins, TFs, receptors, lncRNAs, and miRNAs.
            interaction abilities, transcriptional regulation abilities, and   The green lines represent the protein-protein interactions, and the orange
            post-transcriptional  regulations in candidate GWGENs   lines represent the gene regulations.
                                                               Abbreviations: CHP: Chronic hypersensitivity pneumonitis; GWGEN:
            to obtain real GWGENs for CHP and non-CHP cells.   Genome-wide and epigenetic interaction networks; lncRNA: Long non-
            This is done by removing the insignificant interactions   coding RNA; TF: Transcription factor.
            Volume 2 Issue 2 (2025)                         84                               doi: 10.36922/mi.4620
   87   88   89   90   91   92   93   94   95   96   97