Page 87 - MI-2-2
P. 87

Microbes & Immunity                                               Big data and DNN-based DTI model in CHP



            models. Three design specifications were used to select   based on three design specifications for the treatment of
            adequate molecular drugs as potential multi-molecular   CHP.
            drugs for CHP, with an emphasis on inhibiting and
            regulating disease symptoms. Lung fibrosis cells from CHP   2.2. Data preprocessing of CHP and healthy controls
            patients and normal lung tissue cells from healthy control   microarray data
            groups,  specifically  the  A549 cell line, were  sampled  to   In  this  study,  microarray  data  (GSE86618)  from  the
            construct a GWGEN for the identification of core signaling   National Center for Biotechnology Information (NCBI)
            pathways involved in the pathogenesis of CHP. Candidate   were  analyzed.  The  dataset,  divided  into  the  fibrosis
            GWGEN networks, including candidate PPINs and GRNs   lung cell and the healthy lung tissue cell control groups,
            encompassing regulatory networks of fibrosis genes and   consists of 325 CHP samples and 215 non-CHP samples.
            genes from healthy controls, were constructed using   The significant variability in gene expressions and PPIs
            big data mining and preprocessing of gene/microRNA   under different biological conditions highlights the risk
            (miRNA)/long non-coding RNA (lncRNA) expression    of overlooking individual differences if only significant
            data from fibrosis cells in online databases.      genes or proteins are considered. To uncover the cellular
                                                               mechanisms of  CHP, the corresponding  core signaling
              By leveraging big data mining and systems modeling
            techniques, we aimed to investigate and compare the   pathways must be identified from GWGEN. The candidate
                                                               GWGEN, constructed using experimental data and
            molecular  mechanisms  of  CHP  and non-CHP. This   computational predictions from multiple databases,
            involved constructing a GWGEN, core GWGEN, and core   includes both the PPIN and GRN. A  binary Boolean
            signaling pathways for both CHP and non-CHP. The steps   matrix was used to represent the candidate GWGEN, with
            involved are depicted in Figure 1. First, big data mining was   “1” indicating interaction or regulation between nodes and
            used from the corresponding database and preprocessing   “0” indicating no interaction or regulation.
            of gene, miRNA, and lncRNA expression data. Second,
            candidate PPI  networks and GRNs were  employed to   To establish the candidate PPIN, we referenced various
            construct GWGEN candidates. Next, whole-genome     databases, including the Transcription Factor Database
                                                                           7
            microarray data from different samples of CHP and non-  (TRANSFAC),  the Biological General Repository
                                                                                              8
            CHP (fibrosis cells and healthy control cells) were used to   for Interaction Datasets (BioGRID),  the Database of
                                                                                     9
            identify the pathogenetic mechanism of CHP and compare   Interacting Proteins (DIP),  the Molecular INTeraction
                                                                              10
                                                                                         11
            them with the healthy control group. Subsequently, the real   database (MINT),  and IntAct.  Similarly, we utilized
                                                                              12
            GWGENs of CHP and non-CHP were constructed, and    TargetScanHuman  and the Human Transcriptional
                                                                                                   13
            the PNP method was employed to extract core proteins,   Regulation Interaction Database (HTRIdb)  to identify
            genes, miRNA, and lncRNA from the real GWGENs to   candidate regulatory interactions between human
            construct the core GWGEN of CHP and non-CHP. Finally,   transcription factors and their targets. These interactions
            core signaling pathways were constructed by annotating   were based on integrated transcription factor platform
                                                                                         15
                                                                              14
                                                                                                            16
            the KEGG pathways of CHP and health control from their   databases (ITFP),  CircuitDB,  and StarBase v2.0.
            core  GWGENs.  Finally,  the  core  signaling  pathways  of   By integrating candidate PPI, candidate regulatory
            CHP and healthy control and their downstream cellular   interactions between transcription factors (TFs)/miRNA/
            dysfunction were compared to investigate the pathogenetic   lncRNA and their target genes/miRNA/lncRNA, whole-
            mechanisms.                                        genome microarray gene/miRNA/lncRNA expression data,
                                                               and DNA methylation profiles, we obtained the candidate
              By comparing the core GWGEN of fibrosis cells from   GWGEN  using MATLAB  text files.  String  manipulation
            CHP patients with normal lung cells from the healthy   tools in large text mining were used to standardize these
            control group at different stages of the cell repair cycle   gene names according to the standard gene name rules
            (damage, repair, and fibrosis), we extracted the differential   in the NCBI Gene Database, achieving an automatic
            core signal pathways between CHP patient fibrosis cells   extraction method for key biomarkers.
            and normal lung cells. This helped to unravel the genetic
            and epigenetic mechanisms underlying fibrosis in CHP   2.3. Stochastic regression models construction of
            patients. In addition, based on our findings, we selected   candidate PPIN and gene regulation network of
            significant biomarkers of the pathogenetic mechanisms   candidate genome-wide and EINs
            as drug targets for the treatment of CHP. Finally, utilizing   In the previous section, based on text mining and
            DTI databases, we trained a DNN-based DTI model to   comprehensive big data mining from GSE86618
            predict candidate molecular drugs, from which potential   microarray gene databases, we constructed a candidate
            molecular drugs were selected as multi-molecular drugs   GWGEN. However, the candidate GWGEN built from


            Volume 2 Issue 2 (2025)                         79                               doi: 10.36922/mi.4620
   82   83   84   85   86   87   88   89   90   91   92