Page 72 - TD-4-3
P. 72
Tumor Discovery Highly accurate gene panels for cancer screening
of genes required for a perfect panel depend on the size of 4.4. Cancer diagnosis, tumor taxonomy, and gene
the tumor sample set? therapy
The results, summarized in Figure S2, revealed that in Our construction of perfect gene panels follows a data-
the smaller external dataset, a single gene identifies 98% driven approach to gene expression profiles that do
of the tumor samples, and the addition of a second gene not require prior domain knowledge of the biological
completes the panel, achieving maximal sensitivity and relevance of individual genes in a given tissue. These
specificity without requiring TRIM27. In contrast, for panels have an apparent value as candidate combinatorial
the larger TCGA dataset, the first gene alone covers only biomarkers for diagnosis, which could be further enhanced
95% of tumors, and the two-gene panel still leaves 1% of by incorporating information about gene ontology and
samples unclassified. In that case, TRIM27 is necessary function into our data mining process.
to achieve full classification. These observations suggest In addition, the perfect T-gene panels could be leveraged
rare tumor variants emerge only in larger datasets. Their in tumor taxonomy. Typically, tumor classification and
low frequency means that they are often absent in smaller the associated therapeutic decisions are made based on
cohorts, where simpler panels may suffice. the most frequently mutated genes in a given tumor (for
For illustration, a hypothetical cohort of 5,000 tumor example, Ruiz-Cordero et al., for lung cancer). However,
61
samples is also considered in Figure S2. In that scenario, the classification is often incomplete, with a subset of
the 3-gene panel covers 99.7% of tumors, indicating that tumors assigned to the so-called “wild-type” category,
a fourth gene would likely be needed to achieve complete meaning that none of the genes in the reference panel
coverage. The figure also shows that saturation is reached are mutated. In our framework, any perfect T-gene panel
very quickly: the number of classified tumor samples enables a complete classification of tumors by providing the
increases steeply with the addition of genes to the panel. list of dysregulated genes in each tumor sample. Moreover,
This strongly supports our assertion that a small number since multiple perfect panels may exist for a given tissue,
of genes can effectively capture the global state of the Gene tumors could be fully classified under different but equally
Regulatory Network, consistent with the effective reduced valid criteria.
dimensionality of the tumor manifold. 51 Consider, for example, the only-T-above panel for
In summary, the expression distribution functions LUAD, examined above. Both ALDH10A1 and PYCR1
used to define the panels depend on the sample set size. genes, related to glutamine metabolism, are known to
When the sample size reaches the order of hundreds, the play an important role in lung cancer. 62,63 The taxonomy
distribution appears “saturated,” showing only minor based on this panel indicates that around 98 % of LUAD
changes when the number of samples is further increased. tumors rely on glutamine metabolism to foster cell
proliferation and induce an immune-suppressive tumor
This insight allowed us to evaluate how our panels
would change with an increased number of normal microenvironment. In the remaining 2% of tumors,
cell proliferation is regulated by TRIM27 through the
samples. For instance, assuming that the distribution SIX homeobox 3-β-catenin signaling pathway. These
64
functions are saturated in BRCA (112 normal samples statements reflect the known role of these genes and their
and 1094 tumor samples), we performed re-sampling dysregulation frequencies in the tumor subpopulation.
to assess the performance of the six-gene only-T-above Nevertheless, further research is needed to validate
panel found for BRCA (Supplementary File) under highly these findings and translate them into therapeutic
imbalanced situations, such as 20 normal samples and 500 recommendations.
tumor samples. The results, shown in Figure S3, indicate
that while the panel size tends to decrease in the reduced Moreover, N- and T-genes included in the perfect
sets, notably, two genes from the original panel still classify panels may have important applications in gene therapy.
more than 95% of samples in all cases. Consider, for instance, a gene belonging to both N- and
T-groups, such as the AGER gene in LUAD. This gene
Thus, we expect, for example, that the single-gene
only-T-above panel found for uterine corpus endometrial is silenced in tumors and strongly expressed in normal
samples. What happens if, through a transfection vector,
carcinoma (23 normal samples) may change as the normal its expression were shifted from the N-region to the
sample size grows, but the original gene will continue to T-region or vice versa? Such an experiment has already
cover at least 85% of the tumor samples. been conducted on cellular lines, and the results indicate
65
It is worth noting that Figure S3 can also be interpreted a significant change in the proliferation rate and invasion
as a form of validation of the six-gene only-T-above panel capacity of both tumor and normal cells. These astonishing
for BRCA across different experimental conditions. results warrant further investigation.
Volume 4 Issue 3 (2025) 64 doi: 10.36922/TD025190035

