Page 96 - MI-2-2
P. 96
Microbes & Immunity Big data and DNN-based DTI model in CHP
Figure 4. This figure provides annotations of the core genome-wide and epigenetic interaction networks through the Kyoto Encyclopedia of Genes and
Genomes pathways, highlighting the specific signaling pathways associated with chronic hypersensitivity pneumonitis (CHP) (shown on the left) and
healthy controls (shown on the right). The middle area contains overlapping signaling pathways of both CHP and non-CHP. This figure shows that the
transforming growth factor β, tumor necrosis factor, and p38-mitogen-activated protein kinase signaling pathways contribute to pneumonitis fibrosis and
the chemokine (C-X-C motif) ligand 1, cell division cycle 23, chemokine (C-C motif) ligand 2, and chemokine (C-C motif) ligand 20 signaling pathways
are involved in cell apoptosis and proliferation. Based on the pathogenetic mechanisms of CHP, the significant biomarkers above are selected as drug
targets for the treatment of CHP.
Since the DNN-based DTI model is designed with is employed to learn data and check the stability and
only 996 inputs, the dimensions of the drug-target feature prediction performance of the DNN-based DTI model.
vectors in Equations XLVII-XLIX must the reduced to 996 The early-stopping approach is applied to avoid overfitting.
so that they could be inputted into the model. To achieve The activation function of the output layer is a sigmoid
this reduction, we downsampled the feature vectors of the function to limit the output value to a range between 0
drug-target pair using the principal component analysis and 1, as the probability predicted by the DNN-based DTI
method to reduce their dimension to 996. After data model. Since DTI is a binary classification problem, the
preprocessing, we use 75% of the data as training data and model loss is computed using the binary cross-entropy cost
25% of the data as testing data. Furthermore, we split 20% function, as shown in Equations L and LI :
19
of the training data as validation data to do the five-fold n ( n)
cross-validation. The DNN-based DTI model in Figure 5 Cw b, ( ) =− p [ n log( p ) + 1 − p log ( −1 p )] (L)
n
n
has four hidden layers containing 512, 256, 128, and 64
N
neurons sequentially. A binary cross entropy method is L wb, ( ) = 1 ∑ Cw b( ,) (LI)
used as the loss function, with a learning rate of 0.003, N n=1 n
and the backward propagation algorithm was used as
the optimizer. We set the number of epochs to 100 and where p denotes the truth label of positive interaction,
n
the batch size to 100. To prevent the vanishing gradient p denotes the predictive probability of positive interaction,
n
problem, each hidden layer employed a rectified linear unit and (1- p ) represents the predicted probability of negative
n
activation function. The five-fold cross-validation method
Volume 2 Issue 2 (2025) 88 doi: 10.36922/mi.4620

