Page 70 - TD-4-3
P. 70
Tumor Discovery Highly accurate gene panels for cancer screening
whether the gene’s expression interval was N-exclusive, remaining irredundant. In practice, these panels comprise
T-exclusive, or shared by N and T. This matrix structure 1 – 20 genes, making them suitable for cancer diagnostics. 50
provided all the necessary information for constructing In the example considered in Section 2.4, the three-gene
perfect gene panels. set constitutes a perfect panel for the only-T-above class. In
2.7. Perfect panels its expression dysregulation matrix, normal samples show
expression values of −1 or 0. Every tumor sample has at
Differentially expressed and non-differentially least one dysregulated gene (value 1) in the panel, as shown
dysregulated genes often form large pools containing over in the histogram of Figure 1. Thus, this panel exhibits no
a thousand members, which are impractical for real-world false negatives or false positives.
applications. In genetic-based hereditary risk assessment,
diagnostics, and therapy, smaller gene panels (comprising 3. Results
5 – 50 genes) are often preferred. 50
First, we note that, in the average cancer type, only nearly
Due to the low dimensionality of the gene expression 3% of the genes qualify as N-genes. The observation that
data, it is possible to extract compact panels from these more than one-third of the genome, and the vast majority
51
large gene sets. In particular, panels can be designed of classifier genes, fall within the T-gene category aligns
to perfectly classify all normal and tumor samples with cancer’s characterization as a high-entropic state of
collectively, with the additional requirement that removing gene regulatory networks 52,53 and is an indication of the
any member from the panel would compromise this abundance of potential genetic triggers for cancer.
classification accuracy. Perfect panels constructed according to our procedure
These panels can be identified using a concept similar are summarized in Table 2. When no perfect panel exists,
to, but distinct from, reducts in RST, 42,43 which we termed a we reported the size of the minimal gene set that classifies
formal-concept reduct. To the best of our knowledge, this is the largest sample subset. Notably, only-T-above and
the first presentation of formal-concept reducts, although only-T-below panels may include oncogenes and tumor
44
more stringent related concepts have been proposed by suppressors, respectively. As shown in Table 2, all 12 cancer
Zhang. 45 types exhibit perfect panels of both T-types.
Our algorithm for constructing perfect panels is based Conversely, perfect panels with only-N-above or only-
on progressively maximizing sensitivity. At each step, we N-below genes appear irregularly in some tissues (Table 2).
iteratively add the differentially expressed genes that are Specifically, breast invasive carcinoma (BRCA), head and
most dysregulated in tumor samples not yet identified by neck squamous cell carcinoma, kidney renal clear cell
the current panel (i.e., those samples where the included carcinoma, kidney renal papillary cell carcinoma, LUAD,
genes show no dysregulation), until all tumor samples are prostate adenocarcinoma, and thyroid carcinoma contain
discovered. only only-N-above, uterine corpus endometrial carcinoma,
colon adenocarcinoma (COAD), lung squamous cell
The equivalent procedure involves iteratively adding
the non-differentially dysregulated gene that most carcinoma, and stomach adenocarcinoma contain both
N-types, while liver hepatocellular carcinoma contains
frequently exhibits normal regulation in the remaining only only-N-below.
undiscovered normal samples (i.e., those in which the
genes already included are dysregulated) until all normal An inventory of perfect gene panels for the 12 types of
samples are discovered. If, at any iteration, there is gene cancer under study is presented in the Supplementary File.
selection ambiguity, we prioritize the most redundant Notably, some cancer types can be perfectly classified using
candidate – i.e., the gene whose dysregulation pattern a single gene. This is the case for COAD with SCARA5,
overlaps maximally with existing panel members across kidney renal papillary cell carcinoma with UMOD, and
already classified samples. Further ambiguities are resolved uterine corpus endometrial carcinoma with either PLSCR4
by arbitrarily selecting the first candidate in the list. or TBC1D7.
Panels constructed this way are minimal: no gene can 4. Discussion
be removed without compromising perfect classification.
However, they are not necessarily the smallest collection of 4.1. Gene expression dysregulation
genes achieving such goal nor are they necessarily unique. Dysregulation in gene expression can promote cancer.
54
Modifying ambiguity-resolution criteria may give rise to Within this phenomenon, differential expression – where
different and/or smaller gene sets that can achieve perfect genes show altered expression in tumors versus normal
discrimination between normal and tumor samples, while tissues – represents the most extensively studied subset. 10
Volume 4 Issue 3 (2025) 62 doi: 10.36922/TD025190035

