Page 67 - TD-4-3
P. 67

Tumor Discovery                                               Highly accurate gene panels for cancer screening



            high-throughput microarrays and next-generation RNA   either all normal or all tumor samples. This allowed us to
            sequencing (RNA-seq).  These technologies enabled   identify genes that serve as classifiers without false positives
                                2
            the  development  of  increasingly  specialized  databases   or false negatives when distinguishing tumor and normal
                                              3
            with a focus on biomedical applications.  A prominent   tissue within the training data. We refer to these as T-genes
            example is The Cancer Genome Atlas (TCGA), which   (differentially expressed only-tumor genes) and N-genes
            provides potentially crucial information on cancer   (non-differentially dysregulated only-normal genes). These
            detection, treatment, and the fundamental biology   genes are characterized by specific expression intervals
                         4,5
            of oncogenesis.  TCGA hosts extensive genomic,     that are exclusively populated by tumor and normal tissue
            epigenomic, transcriptomic, and proteomic data on tumor   samples, respectively. By combining N-  or T-genes, we
            and normal tissue samples for 33 cancer types.  All of   constructed compact gene panels – referred to as “perfect
                                                    6
            this data are publicly available for mining and analysis   gene panels” – that perfectly discriminate between tumor
            in  pursuit  of  discovering  specific  genetic  markers  and   and normal samples within the training data.
            targets.  As expected, the current analyses of TCGA data   Our  core  procedure  resembles  formal  concept
                  6
            reflect the scale and complexity of this experimental feat   analysis 16-27  and rough set theory (RST), 28-39  both with a
            of collecting such a vast amount of data.  However, a   growing number of applications in omics. The main scope
                                               7,8
            definitive consensus on the most adequate set of genes for   of these techniques is to discover patterns (namely, formal
            diagnosis and therapy remains elusive.             concepts or rough sets) in multivariate data, where a set of
              Gene discovery relevant to carcinogenesis and tumor   attributes is made to correspond to a set of objects through
            progression is partially guided by the assessment of gene   a specific relation. 40,41  This is precisely the framework
            dysregulation based on both statistical and biological   under consideration, with the following mapping: genes
            significance.  The paradigmatic kind of gene dysregulation   take the role of attributes, clinical samples correspond to
                      9
            is differential expression,  whereby a gene is expressed   objects, and gene expression profiles define the relation
                                10
                                                                           18
            differently in a tumor compared to a normal tissue.   between them.  Our sets of N-genes and T-genes define
            Conventionally, differential expression is associated with   both formal and attribute-oriented concepts, 40,41  where the
            cancer only when there is a marked deviation from normal   extents of these concepts correspond to either tumor or
            expression levels, typically defined in terms of average   normal samples, depending on the concept type. Moreover,
            values  across  tumor  and  normal  samples.  However,  as   the perfect gene panels align with the notion of a reduct in
            emphasized by several authors, 11-14  framing gene expression   RST, 42-45  in the sense that none of their gene members can
            dysregulation solely in terms of central tendency can   be removed without compromising the panel’s ability to
            hinder gene discovery in translational cancer research.   perfectly classify samples.
            Indeed, gene expression levels in tumor or normal tissue   Perfect gene panels appear in various forms, depending
            samples may differ in their variance or distribution, even   on the location of tumor-exclusive or normal-exclusive
            when mean values remain unchanged. Consequently, the   intervals within the gene expression space. Some of these
            detection of differential dispersion 12,13  and differential   panels have a clear interpretation within the state-of-the-
            distribution  provides a broader perspective on human   art taxonomy of driver genes, provided an interventionist
                     14
            cancer-related genes by addressing the shortcomings of   proof of their causal power. For instance, certain panels
            standard differential expression protocols. Despite their   feature a single gene whose over-expression signals a
            important contributions, these alternative techniques often   tumor – a behavior akin to oncogenes. Conversely, for
            rest  on  distributional  assumptions  that  may  not  reflect   other panels, a single non-silenced gene is an indication of
            the  regulatory dynamics  of many  genes,  such  as  those   a tumor-free sample, which fits our current understanding
            involved in circadian rhythm control.  To the best of our   of tumor suppressor genes. Other  panels may include
                                          15
            knowledge, the field still lacks sufficiently flexible methods   cooperative tumor suppressor genes, oncogenes, and
            to detect diverse patterns of gene expression dysregulation   oscillatory genes.
            beyond changes in central tendency.
                                                                 In this paper, we explore 12 solid tumors among the
              In this context, we identify novel candidate genes for   33 cancer types in TCGA. For each tissue analyzed, we
            cancer therapy and diagnostics by applying an original   identify perfect gene panels with potential applications in
            non-parametric approach to gene expression profiles   diagnosis and therapy. By design, perfect panels achieve
            from the TCGA database. Rather than relying on uniform   zero false positives or false negatives within the training
            characterizations based on averages or specific distributional   data. Notably, one T-gene panel for lung adenocarcinoma
            shapes, we explore gene-dependent definitions of normal   (LUAD) also demonstrated high sensitivity and specificity
            and tumor-like expression using intervals that encompass   in an external dataset.


            Volume 4 Issue 3 (2025)                         59                           doi: 10.36922/TD025190035
   62   63   64   65   66   67   68   69   70   71   72