Page 95 - MI-2-2
P. 95

Microbes & Immunity                                               Big data and DNN-based DTI model in CHP



            lncRNAs in real GWGENs. A projection value D  (b) close   KEGG, UniProt,  PRISM, DrugBank,  PubChem , and
                                                                                                        23
                                                                            21
                                                                                              22
                                                  R
            to zero indicates that the corresponding node is almost   ChEMBL,   a  DNN-based  DTI  model  was  trained  to
                                                                       24
            independent of the top I right-singular vectors. In contrast,   predict candidate molecular drugs for these important
            if a node of the real GWGEN has a higher projection value,   biomarkers. The flowchart of the DNN-based DTI model
            it suggests that the node plays a more significant role in the   is shown in Figure 5. After identifying candidate molecular
            principal network structure of GWGEN from an energy   drugs for targeting CHP and considering their regulatory
            perspective. Since this study aims to compare the core   ability, sensitivity, and toxicity as design considerations
            signaling pathways of CHP and non-CHP lung slice cells   and selection criteria for potential multi-molecule drugs
            to investigate the pathogenetic mechanisms of CHP, we   of CHP, we proceeded with drug rediscovery and synthetic
            can identify the top 6,000 proteins, TFs, genes, miRNAs,   design.
            and lncRNAs of real GWGEN of both CHP and non-       We preprocessed the DTI data before training the
            CHP. These core molecules can then be used for pathway   DNN-DTI model using the DTI data from the databases.
            annotation through KEGG pathways. The identified   We collected relevant data from DTI databases, including
            proteins, TFs, genes, miRNAs, and lncRNAs form the core   KEGG, UniProt,  PRISM, DrugBank,  PubChem,  and
                                                                                                        23
                                                                            21
                                                                                              22
            signaling pathways of CHP and non-CHP, as depicted in   ChEMBL.  To input the data into the DNN-based DTI
                                                                       24
            Figure 3, which will be essential for further investigations   model, we used the PROFEAT website and PyBioMed tool
            into the pathogenetic mechanisms of CHP.           in the Python 3.7 environment to convert the DTI data into
            2.6. Candidate drugs predictions to alleviate CHP   feature  vectors  with  drug-target  pairs.  For  a  drug-target
            symptoms using a DNN-based DTI model               pair, the feature vector can be represented as presented in
                                                               Equation XLVII :
                                                                            19
            Based on the core signaling pathways and downstream
            cellular dysfunctions of CHP and non-CHP, as illustrated   P  =   D T =,    [ d d ……,,  d ,  , tt ……,,  t  ] (XLVII)
            in  Figure  4, significant biomarkers of pathogenesis   drug test−  1  2    M−1  1  2   N −1
            were identified as drug targets for the treatment of CHP   where P   is the drug-target pair in feature vector
            patients. Using DTI data from DTI databases such as         drug−test
                                                               form. D represents the feature vector of the corresponding
             A                                                 drug, and T represents the feature vector of the drug target
                                                               (biomarker). M is the total number of drug features, and
                                                               N is the total number of drug target features. Since most
                                                               of the original training data had unknown interactions
                                                               or negative data, the next step was data preprocessing. To
                                                               address the class imbalance issue, we reduced the amount of
                                                               unknown interaction data. Then, to account for variations
                                                               in units across different features,  we standardized each
                                                               feature vector and normalized their significance. The
             B
                                                               mathematical formulas (Equations XLVIII and XLIX) for
                                                               drug and target feature normalization are presented :
                                                                                                        19
                                                                   d − µ
                                                                *
                                                                           i 12
                                                               d =  i σ i  i  ,∀=  , ,,…  M           (XLVIII)
                                                                i
                                                                   t − µ
                                                               t =  j σ j  j  ,∀=  , ,,…  N             (XLIX)
                                                                          j 12
                                                                *
                                                                j
                                                                                                    *
            Figure 3. (A) The core GWGEN of CHP and (B) the core GWGEN of   where d denotes the i-th drug feature and d  expresses
                                                                        i
                                                                                                    i
            non-CHP. The core GWGENs were extracted using the principal network   the i-th drug feature after the standardization. µ  and σ ,
                                                                                                       i
                                                                                                            I
            projection method from the real GWGEN to simplify the annotation   respectively, denote the mean and standard deviations of
            of core signaling pathways using the Kyoto Encyclopedia of Genes and   the i-th drug feature. t represents the j-th feature of the
            Genomes pathways to investigate the pathogenetic mechanism of CHP.   *  j
            The numbers indicate the  node numbers  of proteins, TFs, receptors,   targiet and  t  represents the j-th feature of the target after
                                                                         j
            lncRNAs, and miRNAs. The green lines represent the protein-protein   standardization. j and j separately indicate the mean and
            interactions, and the orange lines represent the gene regulations.  standard deviation of the j-th target feature. M expresses
            Abbreviations: CHP: Chronic hypersensitivity pneumonitis; GWGEN:
            Genome-wide and epigenetic interaction networks; lncRNA: Long non-  the total number of drug features, and N denotes the total
            coding RNA; TF: Transcription factor.              number of target features.
            Volume 2 Issue 2 (2025)                         87                               doi: 10.36922/mi.4620
   90   91   92   93   94   95   96   97   98   99   100