Page 67 - TD-4-3
P. 67
Tumor Discovery Highly accurate gene panels for cancer screening
high-throughput microarrays and next-generation RNA either all normal or all tumor samples. This allowed us to
sequencing (RNA-seq). These technologies enabled identify genes that serve as classifiers without false positives
2
the development of increasingly specialized databases or false negatives when distinguishing tumor and normal
3
with a focus on biomedical applications. A prominent tissue within the training data. We refer to these as T-genes
example is The Cancer Genome Atlas (TCGA), which (differentially expressed only-tumor genes) and N-genes
provides potentially crucial information on cancer (non-differentially dysregulated only-normal genes). These
detection, treatment, and the fundamental biology genes are characterized by specific expression intervals
4,5
of oncogenesis. TCGA hosts extensive genomic, that are exclusively populated by tumor and normal tissue
epigenomic, transcriptomic, and proteomic data on tumor samples, respectively. By combining N- or T-genes, we
and normal tissue samples for 33 cancer types. All of constructed compact gene panels – referred to as “perfect
6
this data are publicly available for mining and analysis gene panels” – that perfectly discriminate between tumor
in pursuit of discovering specific genetic markers and and normal samples within the training data.
targets. As expected, the current analyses of TCGA data Our core procedure resembles formal concept
6
reflect the scale and complexity of this experimental feat analysis 16-27 and rough set theory (RST), 28-39 both with a
of collecting such a vast amount of data. However, a growing number of applications in omics. The main scope
7,8
definitive consensus on the most adequate set of genes for of these techniques is to discover patterns (namely, formal
diagnosis and therapy remains elusive. concepts or rough sets) in multivariate data, where a set of
Gene discovery relevant to carcinogenesis and tumor attributes is made to correspond to a set of objects through
progression is partially guided by the assessment of gene a specific relation. 40,41 This is precisely the framework
dysregulation based on both statistical and biological under consideration, with the following mapping: genes
significance. The paradigmatic kind of gene dysregulation take the role of attributes, clinical samples correspond to
9
is differential expression, whereby a gene is expressed objects, and gene expression profiles define the relation
10
18
differently in a tumor compared to a normal tissue. between them. Our sets of N-genes and T-genes define
Conventionally, differential expression is associated with both formal and attribute-oriented concepts, 40,41 where the
cancer only when there is a marked deviation from normal extents of these concepts correspond to either tumor or
expression levels, typically defined in terms of average normal samples, depending on the concept type. Moreover,
values across tumor and normal samples. However, as the perfect gene panels align with the notion of a reduct in
emphasized by several authors, 11-14 framing gene expression RST, 42-45 in the sense that none of their gene members can
dysregulation solely in terms of central tendency can be removed without compromising the panel’s ability to
hinder gene discovery in translational cancer research. perfectly classify samples.
Indeed, gene expression levels in tumor or normal tissue Perfect gene panels appear in various forms, depending
samples may differ in their variance or distribution, even on the location of tumor-exclusive or normal-exclusive
when mean values remain unchanged. Consequently, the intervals within the gene expression space. Some of these
detection of differential dispersion 12,13 and differential panels have a clear interpretation within the state-of-the-
distribution provides a broader perspective on human art taxonomy of driver genes, provided an interventionist
14
cancer-related genes by addressing the shortcomings of proof of their causal power. For instance, certain panels
standard differential expression protocols. Despite their feature a single gene whose over-expression signals a
important contributions, these alternative techniques often tumor – a behavior akin to oncogenes. Conversely, for
rest on distributional assumptions that may not reflect other panels, a single non-silenced gene is an indication of
the regulatory dynamics of many genes, such as those a tumor-free sample, which fits our current understanding
involved in circadian rhythm control. To the best of our of tumor suppressor genes. Other panels may include
15
knowledge, the field still lacks sufficiently flexible methods cooperative tumor suppressor genes, oncogenes, and
to detect diverse patterns of gene expression dysregulation oscillatory genes.
beyond changes in central tendency.
In this paper, we explore 12 solid tumors among the
In this context, we identify novel candidate genes for 33 cancer types in TCGA. For each tissue analyzed, we
cancer therapy and diagnostics by applying an original identify perfect gene panels with potential applications in
non-parametric approach to gene expression profiles diagnosis and therapy. By design, perfect panels achieve
from the TCGA database. Rather than relying on uniform zero false positives or false negatives within the training
characterizations based on averages or specific distributional data. Notably, one T-gene panel for lung adenocarcinoma
shapes, we explore gene-dependent definitions of normal (LUAD) also demonstrated high sensitivity and specificity
and tumor-like expression using intervals that encompass in an external dataset.
Volume 4 Issue 3 (2025) 59 doi: 10.36922/TD025190035

