Page 95 - MI-2-2
P. 95
Microbes & Immunity Big data and DNN-based DTI model in CHP
lncRNAs in real GWGENs. A projection value D (b) close KEGG, UniProt, PRISM, DrugBank, PubChem , and
23
21
22
R
to zero indicates that the corresponding node is almost ChEMBL, a DNN-based DTI model was trained to
24
independent of the top I right-singular vectors. In contrast, predict candidate molecular drugs for these important
if a node of the real GWGEN has a higher projection value, biomarkers. The flowchart of the DNN-based DTI model
it suggests that the node plays a more significant role in the is shown in Figure 5. After identifying candidate molecular
principal network structure of GWGEN from an energy drugs for targeting CHP and considering their regulatory
perspective. Since this study aims to compare the core ability, sensitivity, and toxicity as design considerations
signaling pathways of CHP and non-CHP lung slice cells and selection criteria for potential multi-molecule drugs
to investigate the pathogenetic mechanisms of CHP, we of CHP, we proceeded with drug rediscovery and synthetic
can identify the top 6,000 proteins, TFs, genes, miRNAs, design.
and lncRNAs of real GWGEN of both CHP and non- We preprocessed the DTI data before training the
CHP. These core molecules can then be used for pathway DNN-DTI model using the DTI data from the databases.
annotation through KEGG pathways. The identified We collected relevant data from DTI databases, including
proteins, TFs, genes, miRNAs, and lncRNAs form the core KEGG, UniProt, PRISM, DrugBank, PubChem, and
23
21
22
signaling pathways of CHP and non-CHP, as depicted in ChEMBL. To input the data into the DNN-based DTI
24
Figure 3, which will be essential for further investigations model, we used the PROFEAT website and PyBioMed tool
into the pathogenetic mechanisms of CHP. in the Python 3.7 environment to convert the DTI data into
2.6. Candidate drugs predictions to alleviate CHP feature vectors with drug-target pairs. For a drug-target
symptoms using a DNN-based DTI model pair, the feature vector can be represented as presented in
Equation XLVII :
19
Based on the core signaling pathways and downstream
cellular dysfunctions of CHP and non-CHP, as illustrated P = D T =, [ d d ……,, d , , tt ……,, t ] (XLVII)
in Figure 4, significant biomarkers of pathogenesis drug test− 1 2 M−1 1 2 N −1
were identified as drug targets for the treatment of CHP where P is the drug-target pair in feature vector
patients. Using DTI data from DTI databases such as drug−test
form. D represents the feature vector of the corresponding
A drug, and T represents the feature vector of the drug target
(biomarker). M is the total number of drug features, and
N is the total number of drug target features. Since most
of the original training data had unknown interactions
or negative data, the next step was data preprocessing. To
address the class imbalance issue, we reduced the amount of
unknown interaction data. Then, to account for variations
in units across different features, we standardized each
feature vector and normalized their significance. The
B
mathematical formulas (Equations XLVIII and XLIX) for
drug and target feature normalization are presented :
19
d − µ
*
i 12
d = i σ i i ,∀= , ,,… M (XLVIII)
i
t − µ
t = j σ j j ,∀= , ,,… N (XLIX)
j 12
*
j
*
Figure 3. (A) The core GWGEN of CHP and (B) the core GWGEN of where d denotes the i-th drug feature and d expresses
i
i
non-CHP. The core GWGENs were extracted using the principal network the i-th drug feature after the standardization. µ and σ ,
i
I
projection method from the real GWGEN to simplify the annotation respectively, denote the mean and standard deviations of
of core signaling pathways using the Kyoto Encyclopedia of Genes and the i-th drug feature. t represents the j-th feature of the
Genomes pathways to investigate the pathogenetic mechanism of CHP. * j
The numbers indicate the node numbers of proteins, TFs, receptors, targiet and t represents the j-th feature of the target after
j
lncRNAs, and miRNAs. The green lines represent the protein-protein standardization. j and j separately indicate the mean and
interactions, and the orange lines represent the gene regulations. standard deviation of the j-th target feature. M expresses
Abbreviations: CHP: Chronic hypersensitivity pneumonitis; GWGEN:
Genome-wide and epigenetic interaction networks; lncRNA: Long non- the total number of drug features, and N denotes the total
coding RNA; TF: Transcription factor. number of target features.
Volume 2 Issue 2 (2025) 87 doi: 10.36922/mi.4620

