Page 128 - ITPS-7-3
P. 128

INNOSC Theranostics and
            Pharmacological Sciences                         Prognostic values of peripheral blood CD4T transcriptomic signature



            of a viral infection, CD4 helper T-cells activate B-cells   each population’s gene expression matrix was restricted to
            and cytotoxic lymphocytes to enable the proper immune   a common set of 12,549 genes. To remove platform-related
            response against the invading pathogen. 1          artifacts or experimental noise, each expression matrix was
              In the case of human immunodeficiency virus (HIV)   mean-centered and standard deviation-scaled and then
                                                                                                           6-8
            infection, CD4Ts are specifically targeted, exploited, and   constrained within values ±5.0 as previously described.
            destroyed, resulting in compromised immunity.  The   2.2. Gene signature discovery from the
                                                     1,2
            progressive depletion of CD4T, as seen in HIV-infected   transcriptomes of healthy peripheral blood
            patients without anti-retroviral treatment, is responsible
            for  the  development  of  acquired  immunodeficiency   The CD4T abundance gene signature was identified using
            syndrome, which renders an individual vulnerable to even   Least Absolute Shrinkage and Selection Operator (LASSO;
            the most commonplace opportunistic pathogens.  Among   glmnet  R package v.4.1.8), an established approach with
                                                  1,2
            HIV patients receiving anti-retroviral treatment, CD4T   the ability to reduce model coefficients to zero, thereby
            abundance is associated with favorable clinical outcomes   nullifying the effect estimate of a given feature unless it is
                                                                        9,10
            and serves as a prognosis indicator. 3             very strong.  LASSO was built on the shared 12,549-gene
                                                               set against the min-max scaled CD4T proportions in the
              The  most  recent  meta-analysis  of  transcriptomes   discovery population, as follows:
            validated the hypothesis that HIV infection induces
            characteristic changes in CD4T gene expression     MinMax(CD4T) ~ gene   + R                   (I)
                                                                                  k
            and biological pathways.  Nevertheless, an in-depth   where R is the LASSO penalty term defined by setting
                                  4
            understanding of CD4T biology in both the healthy and   the  glmnet hyperparameter alpha to 1.0.  MinMax is a
            diseased states remains limited. Thus, this study aims to   scaling function that transforms the input vector into a
            explore  the possibility  of building  a transcriptome-wide   distribution between 0 and 1:
            gene signature using clinically measured CD4T abundance   MinMax(x  ) = (x  - min(x))/(max(x)-min(x))  (II)
            as the outcome in blood samples of non-diseased human       k    k
            subjects. Subsequently, this study investigates the potential   This procedure initially yielded 334  (2.7%) gene
            prognostic utility of the gene signature in a cohort of HIV-  features with non-zero regression coefficients. Given some
            1-positive subjects receiving anti-retroviral therapies to   coefficients had low magnitudes, the LASSO-selected
            offer insights into the role of CD4T abundance.    genes  were  further  filtered by  a coefficient  threshold  of
                                                               1.25e-3, yielding 207 (114 positive and 93 negative) gene
            2. Materials and methods                           features, hereafter referred to as the “CD4T gene signature”
            2.1. Study populations and datasets                (Table S2).

            All datasets used in this study are publicly available and   2.3. In silico validation of the gene-signature final
            obtained from Gene Expression Omnibus (https://www.ncbi.  model with K-fold method
            nlm.nih.gov/geo). The “discovery population” (accession   The identified gene signature was validated with a K-fold
            GSE58137) consists of 340 subjects with both transcriptome-  cross-validation experiment. Specifically, the entire
            wide gene expression data (measured by the Illumina Human   dataset was divided into  K=10 partitions (i.e., folds,
            HT 12 v3.0-4.0 array) and the putative CD4T proportions   each with  n  = 34). The final, optimized version of the
            calculated from the matched  whole-genome  methylation   LASSO model was built on K-1=9 training partitions and
            profiles.  The “application population” (accession   evaluated on the remaining test partition. For each round
                  5
            GSE19087) consists of 24 HIV-1 positive men with both   of cross-validation, several metrics for model evaluations
            gene expression (Illumina Human WG 6 v3.0 array) and   were calculated on the test partition: Root mean-squared
            CD4T counter before and after a 48-week anti-retroviral   error, Pearson’s correlation coefficient (r), and R-squared.
            therapeutic regimen.  Covariates including age and race   The entire procedure was repeated K times so that every
                             3
            were available in both populations. Subject demographics   fold participated in at least one round of training and
            are summarized in Table S1.                        testing. Each evaluation metric was averaged across the
              Some genes measured on the Illumina array had    K trials to yield a final performance indicator (Table S1).
            multiple transcript variants. In addition, the expression
            platforms were of two different versions. To ensure   2.4. Biological interpretation of the CD4T gene
            comparability between the datasets, each population’s   signature
            expression values  were  aggregated by  unweighted  mean   Separate Gene Ontology: Biological Processes no-redundant
            across all available transcript variants of a common gene   analyses (WebGestalt implementation, www.webgestalt.
            so that there would be only one unique gene symbol. Next,   org)  were performed on the gene sets with positive and
                                                                  11

            Volume 7 Issue 3 (2024)                         2                                doi: 10.36922/itps.2761
   123   124   125   126   127   128   129   130   131   132   133