Page 70 - TD-4-3
P. 70

Tumor Discovery                                               Highly accurate gene panels for cancer screening



            whether the gene’s expression interval was N-exclusive,   remaining irredundant. In practice, these panels comprise
            T-exclusive, or shared by N and T. This matrix structure   1 – 20 genes, making them suitable for cancer diagnostics. 50
            provided all the necessary information for constructing   In the example considered in Section 2.4, the three-gene
            perfect gene panels.                               set constitutes a perfect panel for the only-T-above class. In
            2.7. Perfect panels                                its expression dysregulation matrix, normal samples show
                                                               expression values of −1 or 0. Every tumor sample has at
            Differentially  expressed  and  non-differentially  least one dysregulated gene (value 1) in the panel, as shown
            dysregulated genes often form large pools containing over   in the histogram of Figure 1. Thus, this panel exhibits no
            a thousand members, which are impractical for real-world   false negatives or false positives.
            applications. In genetic-based hereditary risk assessment,
            diagnostics, and therapy, smaller gene panels (comprising   3. Results
            5 – 50 genes) are often preferred. 50
                                                               First, we note that, in the average cancer type, only nearly
              Due to the low dimensionality of the gene expression   3% of the genes qualify as N-genes. The observation that
            data,  it is possible to extract compact panels from these   more than one-third of the genome, and the vast majority
                51
            large  gene  sets.  In particular,  panels  can be  designed   of classifier genes, fall within the T-gene category aligns
            to  perfectly  classify  all  normal  and  tumor  samples   with cancer’s characterization as a high-entropic state of
            collectively, with the additional requirement that removing   gene regulatory networks 52,53  and is an indication of the
            any member from the panel would compromise this    abundance of potential genetic triggers for cancer.
            classification accuracy.                             Perfect panels constructed according to our procedure
              These panels can be identified using a concept similar   are summarized in Table 2. When no perfect panel exists,
            to, but distinct from, reducts in RST, 42,43  which we termed a   we reported the size of the minimal gene set that classifies
            formal-concept reduct. To the best of our knowledge, this is   the largest sample subset. Notably, only-T-above and
            the first presentation of formal-concept reducts,  although   only-T-below panels may include oncogenes and tumor
                                                  44
            more  stringent  related  concepts  have  been  proposed  by   suppressors, respectively. As shown in Table 2, all 12 cancer
            Zhang. 45                                          types exhibit perfect panels of both T-types.
              Our algorithm for constructing perfect panels is based   Conversely, perfect panels with only-N-above or only-
            on progressively maximizing sensitivity. At each step, we   N-below genes appear irregularly in some tissues (Table 2).
            iteratively add the differentially expressed genes that are   Specifically, breast invasive carcinoma (BRCA), head and
            most dysregulated in tumor samples not yet identified by   neck  squamous  cell  carcinoma,  kidney  renal  clear  cell
            the current panel (i.e., those samples where the included   carcinoma, kidney renal papillary cell carcinoma, LUAD,
            genes show no dysregulation), until all tumor samples are   prostate adenocarcinoma, and thyroid carcinoma contain
            discovered.                                        only only-N-above, uterine corpus endometrial carcinoma,
                                                               colon adenocarcinoma (COAD), lung squamous cell
              The equivalent procedure involves iteratively adding
            the non-differentially dysregulated gene that most   carcinoma, and stomach adenocarcinoma contain both
                                                               N-types, while liver hepatocellular carcinoma contains
            frequently exhibits normal regulation in the remaining   only only-N-below.
            undiscovered normal samples (i.e., those in which the
            genes already included are dysregulated) until all normal   An inventory of perfect gene panels for the 12 types of
            samples are discovered. If, at any iteration, there is gene   cancer under study is presented in the Supplementary File.
            selection ambiguity, we prioritize the most redundant   Notably, some cancer types can be perfectly classified using
            candidate  –  i.e.,  the  gene  whose  dysregulation  pattern   a single gene. This is the case for COAD with SCARA5,
            overlaps maximally with existing panel members across   kidney  renal  papillary  cell  carcinoma  with  UMOD,  and
            already classified samples. Further ambiguities are resolved   uterine corpus endometrial carcinoma with either PLSCR4
            by arbitrarily selecting the first candidate in the list.  or TBC1D7.
              Panels constructed this way are minimal: no gene can   4. Discussion
            be removed without compromising perfect classification.
            However, they are not necessarily the smallest collection of   4.1. Gene expression dysregulation
            genes achieving such goal nor are they necessarily unique.   Dysregulation in gene expression can promote cancer.
                                                                                                            54
            Modifying ambiguity-resolution criteria may give rise to   Within this phenomenon, differential expression – where
            different and/or smaller gene sets that can achieve perfect   genes show altered expression in tumors versus normal
            discrimination between normal and tumor samples, while   tissues – represents the most extensively studied subset. 10


            Volume 4 Issue 3 (2025)                         62                           doi: 10.36922/TD025190035
   65   66   67   68   69   70   71   72   73   74   75