Page 129 - ITPS-7-3
P. 129

INNOSC Theranostics and
            Pharmacological Sciences                         Prognostic values of peripheral blood CD4T transcriptomic signature



            negative LASSO coefficients against the 12,549 genes used   model coefficients to zero, thereby revealing input features
            as the input for gene-signature discovery. Raw  P-values   (i.e., genes)  most strongly correlated with the outcome
            were adjusted by the Benjamini–Hochberg false discovery   (i.e.,  CD4T abundance). A  tenfold cross-validation
            rate (FDR) method.                                 experiment demonstrated the robustness of the final
                                                               model (tenfold average Pearson’s r = 0.89 and r  = 0.79;
                                                                                                      2
            2.5. Gene signature-based stratification of an     Table S1). The initial CD4T gene signature consisted
            independent cohort of HIV-1-positive men           of 334  (2.7%) genes with non-zero LASSO coefficients
            The CD4T gene signature was subsequently applied to   (Figure 1A). The final version of the signature with 207
            the application population consisting of 24 peripheral   genes  (1.6%)  was obtained by  coefficient thresholding
            blood samples from HIV-1-infected men receiving anti-  (Figure 1A and Table S2).
            retroviral therapies. All study participants had the absolute   To explore the biological relevance of the identified
            CD4T cell count before and after treatment. The percent   gene signature, Gene Ontology: Biological Processes analysis
            change is defined as:                              was performed on the gene features in the positive and
                              Change in CD4T                   negative directions, separately. The gene features positively
            Percent change =100  ×                     (III)
                               Baseline CD4T                   associated with CD4T abundance strongly enriched for
                                                               cellular adhesion (OR = 4.7, FDR = 0.01; Table 1). Notably,
              Unsupervised hierarchical clustering with Euclidean   the members comprising this ontology term included high-
            and  Ward D hyperparameters (pheatmap  R package   profile immune genes involved in HIV-1 pathogenesis:
            v.1.0.12) followed by a dendrogram-tree split at the first   CD28 and CTLA-4. The gene set negatively associated with
            node was used to stratify the application population into   CD4T abundance strongly enriched for metabolic processes
            two groups for downstream statistical analyses.    of macromolecules (all OR ≥ 4.5 and all FDR = 0.05; Table 1).
                                                               The  genes  encoding  CD8  subunits,  CD8A  and  CD8B,
            2.6. Statistical analysis
                                                               showed strong, negative association with CD4 abundance
            Unless otherwise specified, the computational environment   and were selected by the LASSO procedure (Table S3).
            used  was R  4.3.1 (https://www.r-project.org)  with data   Hierarchical clustering of the identified gene signature
            analysis and visualization packages  base (4.3.1),  ggplot2   segregated the discovery population into two major clusters:
            (v.3.4.3), and matrixStats (v.1.0.0). All statistical tests used   Cluster  1 and  Cluster  2 (Figure  1B). The distribution of
            were two-sided. The univariate (unadjusted) association   subject demographics, including race/ethnicity, sex, and
            between two binary variables was determined by a Fisher’s   age, appeared balanced across the gene-signature clusters
            exact test with odds ratio (OR) and 95% confidence interval   (Figure 1B, horizontal tracking bars).
            (CI) estimates. To address potential confounding, the
            multivariate association was determined by multivariate   3.2. Application of CD4T gene signature to an
            logistic regression (a generalized linear model with a   HIV-1-positive cohort for biomedical knowledge
            family  “Binomial”) with adjusted ORs  estimated by   discovery
            exponentiating the model coefficients. The mean difference
            between any two groups was determined by Welch’s t-test.   The next objective was to assess the clinical relevance of
            A  generalized linear model with family “Gaussian” was   the CD4T gene signature in human disease. Given the
            the generalization of the t-test to control for potential   well-known role of CD4T in HIV-1 infection and recovery
                                                                                      3
            confounding variables. All R code is deposited to GitHub   on anti-retroviral treatment,  the CD4T  gene signature
            (https://github.com/ydavidchen/cd4t_pilot_signature).  was evaluated in this disease context. Dataset GSE19087
                                                               has 24 HIV-1 positive men treated with an anti-retroviral
            3. Results                                         regimen for approximately 1 year.  Hierarchical clustering
                                                                                          3
                                                               of the CD4T gene signature stratified the HIV-1 positive,
            3.1. Identification and characterization of a CD4T   anti-retroviral treated men into two major groups
            abundance signature in the transcriptomes of       (Figure 2). The cluster structure, indicated by the pattern
            healthy peripheral blood samples                   of  dendrogram  branching,  showed  striking  similarity

            A transcriptomic gene signature was identified by   to that of the discovery population. On stratification of
            supervised modeling of gene expression against CD4T   cluster membership, the subject demographics present
            proportions using LASSO regression, a conservative   no significant differences between the two clusters
            statistical approach for  selecting the  most important   (Table  2).  However,  the  HIV-1-positive  men  in  Cluster
            features from high-dimensional data space.  LASSO   1 had an average of 122.7% CD4 increase at the end
                                                 9,10
            is well-known for its ability and tendency to shrink the   of the anti-retroviral therapy treatment, compared to

            Volume 7 Issue 3 (2024)                         3                                doi: 10.36922/itps.2761
   124   125   126   127   128   129   130   131   132   133   134