Page 98 - MI-2-2
P. 98

Microbes & Immunity                                               Big data and DNN-based DTI model in CHP



            and the batch size N to 100. For the data, we used 25% as   lethal concentration 50 (LC ) values to prioritize low-
                                                                                      50
            test data and 75% as training data. In addition, we divided   toxicity compounds. Based on these criteria, three
            the training data into five equal parts to implement a five-  molecular drugs were proposed as combined multiple
            fold cross-validation strategy. Of the five-fold data, four-  molecular drugs for CHP treatment, as shown in Table 2.
            fifths were used for DNN-based DTI model training, and   This combination represents a promising therapeutic
            one-fifth served as the validation data to monitor the   approach for CHP.
            model’s performance, ensuring it improved over previous
            epochs. The results of training loss and training accuracy   3. Results
            are shown in Figures 6 and 7, respectively. Furthermore,   3.1. Overview of the systems biology method and
            five-fold cross-validation can verify the stability of the   the systematic drug discovery and design for the
            data and model. To avoid model overfitting during the   treatment of CHP
            training  process in  Equation LIV, we  applied  an  early
            stopping strategy to check if the test accuracy decreased   To gain a deeper understanding of the carcinogenic
            while training accuracy continued to increase. Moreover,   mechanism and identify key carcinogenic biomarkers
            we embedded dropout layers after each hidden layer to   as drug targets for CHP, we employed a systems biology
            further prevent model parameter overfitting and set the   approach using corresponding microarray data, a DNN-
            dropout rate to 0.4. After training the DNN-based DTI   based DTI model trained using DTI databases and a drug
            model in Figure 5, we adopted the performance measures   screening by drug design specifications to find potential
            area under the curve (AUC) score and receiver operating   molecular drugs targeting these crucial biomarkers for the
            characteristic (ROC) curve shown in  Figure  8 to check   treatment of CHP.
            the DNN-based DTI model. The AUC score is a valuable   The first step involves constructing candidate
            evaluation metric for classification problems, where a   GWGEN  for  non-CHP  and  CHP  by  mining  large
            higher AUC score (indicating a larger area under the line)   datasets from databases such as DIP,  IntAct,  BioGRID,
                                                                                                   11
                                                                                                             8
                                                                                            9
            reflects better model accuracy in predicting true positive   MINT,  HTRIdb,  ITFP,  TRANSFAC,  CircuitsDB,
                                                                              13
                                                                                    14
                                                                                                 7
                                                                    10
                                                                                                            15
            and true negative DTIs. The formulas for the AUC score   TargetScanHuman,  and starBase 2.0.  Next, using the
                                                                                              16
                                                                              12
            and ROC curve are presented in Equations LV-LVII :
                                                     25
                                                               system identification methods in Equations I-XXI and
                                                               the system order detection methods in Equations XXII-
                                     TP
                (
            TPR  True Positive Rate) =                 (LV)    XXXIII with microarray data for non-CHP and CHP, we
                                  TP FN+                       constructed the real GWGENs for non-CHP and CHP,
                                                               respectively, as shown in  Figure  2, by eliminating false
                        TN
            Specificity =                             (LVI)    positives from the candidate GWGEN.
                      TN FP+
                                                                 Since only up to 6,000 molecules in the real GWGENs
                                     FP                        can be annotated with KEGG pathways, the PNP method
            FPR  FalsePositiveRate) =                (LVII)    in Equations XLIV-XLVI was applied to extract the
                (
                                  TP FN+
                                                               core GWGEN of CHP and non-CHP, with up to 6,000
              where TP represents the correct positive predictions,   key nodes (Figure  3). These core networks highlight
            TN represents the correct negative predictions, FP denotes   the numbers of proteins, TFs, receptors, lncRNAs,
            false positives, and FN denotes false negatives.   and miRNAs within the core GWGEN. The core signal
                                                               pathways  for  non-CHP  and  CHP  are  constructed  by
              The DNN-based DTI model predicted six candidate   mapping the core GWGEN onto the significant KEGG
            molecular drugs for CHP biomarkers (Table  1). These   pathways, as shown in Figure 4.
            were evaluated based on drug  design specifications,
            including regulatory ability, sensitivity, and toxicity, to   By analyzing downstream cellular dysfunctions,
            identify potential drugs for lung fibrosis. Regulatory   such as over-expression of apoptosis and proliferation in
            ability data from the L1000 Phase 5 dataset informed   Figure  4, the pathogenic mechanisms of pulmonary cell
            drug selection based on the gene expression of     fibrosis in CHP were studied, and significant biomarkers
            biomarkers – negatively correlated drugs were chosen   were identified as drug targets. In addition, the DNN-
            for upregulated biomarkers and positively correlated   based DTI model, trained on DTI interaction data from
            drugs for downregulated biomarkers.  Sensitivity values   databases (Figure 5), predicts candidate molecular drugs
                                          26
            near zero, derived from the PRISM repurposing dataset,   for these biomarkers. These candidates are evaluated based
            indicated minimal cellular sensitivity to selected drugs.   on drug design specifications, such as adequate regulatory
            Toxicity was assessed using ADMETlab 2.0,  referencing   capacity (restoring biomarker expression to normal levels),
                                               27
            Volume 2 Issue 2 (2025)                         90                               doi: 10.36922/mi.4620
   93   94   95   96   97   98   99   100   101   102   103