Page 98 - MI-2-2
P. 98
Microbes & Immunity Big data and DNN-based DTI model in CHP
and the batch size N to 100. For the data, we used 25% as lethal concentration 50 (LC ) values to prioritize low-
50
test data and 75% as training data. In addition, we divided toxicity compounds. Based on these criteria, three
the training data into five equal parts to implement a five- molecular drugs were proposed as combined multiple
fold cross-validation strategy. Of the five-fold data, four- molecular drugs for CHP treatment, as shown in Table 2.
fifths were used for DNN-based DTI model training, and This combination represents a promising therapeutic
one-fifth served as the validation data to monitor the approach for CHP.
model’s performance, ensuring it improved over previous
epochs. The results of training loss and training accuracy 3. Results
are shown in Figures 6 and 7, respectively. Furthermore, 3.1. Overview of the systems biology method and
five-fold cross-validation can verify the stability of the the systematic drug discovery and design for the
data and model. To avoid model overfitting during the treatment of CHP
training process in Equation LIV, we applied an early
stopping strategy to check if the test accuracy decreased To gain a deeper understanding of the carcinogenic
while training accuracy continued to increase. Moreover, mechanism and identify key carcinogenic biomarkers
we embedded dropout layers after each hidden layer to as drug targets for CHP, we employed a systems biology
further prevent model parameter overfitting and set the approach using corresponding microarray data, a DNN-
dropout rate to 0.4. After training the DNN-based DTI based DTI model trained using DTI databases and a drug
model in Figure 5, we adopted the performance measures screening by drug design specifications to find potential
area under the curve (AUC) score and receiver operating molecular drugs targeting these crucial biomarkers for the
characteristic (ROC) curve shown in Figure 8 to check treatment of CHP.
the DNN-based DTI model. The AUC score is a valuable The first step involves constructing candidate
evaluation metric for classification problems, where a GWGEN for non-CHP and CHP by mining large
higher AUC score (indicating a larger area under the line) datasets from databases such as DIP, IntAct, BioGRID,
11
8
9
reflects better model accuracy in predicting true positive MINT, HTRIdb, ITFP, TRANSFAC, CircuitsDB,
13
14
7
10
15
and true negative DTIs. The formulas for the AUC score TargetScanHuman, and starBase 2.0. Next, using the
16
12
and ROC curve are presented in Equations LV-LVII :
25
system identification methods in Equations I-XXI and
the system order detection methods in Equations XXII-
TP
(
TPR True Positive Rate) = (LV) XXXIII with microarray data for non-CHP and CHP, we
TP FN+ constructed the real GWGENs for non-CHP and CHP,
respectively, as shown in Figure 2, by eliminating false
TN
Specificity = (LVI) positives from the candidate GWGEN.
TN FP+
Since only up to 6,000 molecules in the real GWGENs
FP can be annotated with KEGG pathways, the PNP method
FPR FalsePositiveRate) = (LVII) in Equations XLIV-XLVI was applied to extract the
(
TP FN+
core GWGEN of CHP and non-CHP, with up to 6,000
where TP represents the correct positive predictions, key nodes (Figure 3). These core networks highlight
TN represents the correct negative predictions, FP denotes the numbers of proteins, TFs, receptors, lncRNAs,
false positives, and FN denotes false negatives. and miRNAs within the core GWGEN. The core signal
pathways for non-CHP and CHP are constructed by
The DNN-based DTI model predicted six candidate mapping the core GWGEN onto the significant KEGG
molecular drugs for CHP biomarkers (Table 1). These pathways, as shown in Figure 4.
were evaluated based on drug design specifications,
including regulatory ability, sensitivity, and toxicity, to By analyzing downstream cellular dysfunctions,
identify potential drugs for lung fibrosis. Regulatory such as over-expression of apoptosis and proliferation in
ability data from the L1000 Phase 5 dataset informed Figure 4, the pathogenic mechanisms of pulmonary cell
drug selection based on the gene expression of fibrosis in CHP were studied, and significant biomarkers
biomarkers – negatively correlated drugs were chosen were identified as drug targets. In addition, the DNN-
for upregulated biomarkers and positively correlated based DTI model, trained on DTI interaction data from
drugs for downregulated biomarkers. Sensitivity values databases (Figure 5), predicts candidate molecular drugs
26
near zero, derived from the PRISM repurposing dataset, for these biomarkers. These candidates are evaluated based
indicated minimal cellular sensitivity to selected drugs. on drug design specifications, such as adequate regulatory
Toxicity was assessed using ADMETlab 2.0, referencing capacity (restoring biomarker expression to normal levels),
27
Volume 2 Issue 2 (2025) 90 doi: 10.36922/mi.4620

