Page 128 - ITPS-7-3
P. 128
INNOSC Theranostics and
Pharmacological Sciences Prognostic values of peripheral blood CD4T transcriptomic signature
of a viral infection, CD4 helper T-cells activate B-cells each population’s gene expression matrix was restricted to
and cytotoxic lymphocytes to enable the proper immune a common set of 12,549 genes. To remove platform-related
response against the invading pathogen. 1 artifacts or experimental noise, each expression matrix was
In the case of human immunodeficiency virus (HIV) mean-centered and standard deviation-scaled and then
6-8
infection, CD4Ts are specifically targeted, exploited, and constrained within values ±5.0 as previously described.
destroyed, resulting in compromised immunity. The 2.2. Gene signature discovery from the
1,2
progressive depletion of CD4T, as seen in HIV-infected transcriptomes of healthy peripheral blood
patients without anti-retroviral treatment, is responsible
for the development of acquired immunodeficiency The CD4T abundance gene signature was identified using
syndrome, which renders an individual vulnerable to even Least Absolute Shrinkage and Selection Operator (LASSO;
the most commonplace opportunistic pathogens. Among glmnet R package v.4.1.8), an established approach with
1,2
HIV patients receiving anti-retroviral treatment, CD4T the ability to reduce model coefficients to zero, thereby
abundance is associated with favorable clinical outcomes nullifying the effect estimate of a given feature unless it is
9,10
and serves as a prognosis indicator. 3 very strong. LASSO was built on the shared 12,549-gene
set against the min-max scaled CD4T proportions in the
The most recent meta-analysis of transcriptomes discovery population, as follows:
validated the hypothesis that HIV infection induces
characteristic changes in CD4T gene expression MinMax(CD4T) ~ gene + R (I)
k
and biological pathways. Nevertheless, an in-depth where R is the LASSO penalty term defined by setting
4
understanding of CD4T biology in both the healthy and the glmnet hyperparameter alpha to 1.0. MinMax is a
diseased states remains limited. Thus, this study aims to scaling function that transforms the input vector into a
explore the possibility of building a transcriptome-wide distribution between 0 and 1:
gene signature using clinically measured CD4T abundance MinMax(x ) = (x - min(x))/(max(x)-min(x)) (II)
as the outcome in blood samples of non-diseased human k k
subjects. Subsequently, this study investigates the potential This procedure initially yielded 334 (2.7%) gene
prognostic utility of the gene signature in a cohort of HIV- features with non-zero regression coefficients. Given some
1-positive subjects receiving anti-retroviral therapies to coefficients had low magnitudes, the LASSO-selected
offer insights into the role of CD4T abundance. genes were further filtered by a coefficient threshold of
1.25e-3, yielding 207 (114 positive and 93 negative) gene
2. Materials and methods features, hereafter referred to as the “CD4T gene signature”
2.1. Study populations and datasets (Table S2).
All datasets used in this study are publicly available and 2.3. In silico validation of the gene-signature final
obtained from Gene Expression Omnibus (https://www.ncbi. model with K-fold method
nlm.nih.gov/geo). The “discovery population” (accession The identified gene signature was validated with a K-fold
GSE58137) consists of 340 subjects with both transcriptome- cross-validation experiment. Specifically, the entire
wide gene expression data (measured by the Illumina Human dataset was divided into K=10 partitions (i.e., folds,
HT 12 v3.0-4.0 array) and the putative CD4T proportions each with n = 34). The final, optimized version of the
calculated from the matched whole-genome methylation LASSO model was built on K-1=9 training partitions and
profiles. The “application population” (accession evaluated on the remaining test partition. For each round
5
GSE19087) consists of 24 HIV-1 positive men with both of cross-validation, several metrics for model evaluations
gene expression (Illumina Human WG 6 v3.0 array) and were calculated on the test partition: Root mean-squared
CD4T counter before and after a 48-week anti-retroviral error, Pearson’s correlation coefficient (r), and R-squared.
therapeutic regimen. Covariates including age and race The entire procedure was repeated K times so that every
3
were available in both populations. Subject demographics fold participated in at least one round of training and
are summarized in Table S1. testing. Each evaluation metric was averaged across the
Some genes measured on the Illumina array had K trials to yield a final performance indicator (Table S1).
multiple transcript variants. In addition, the expression
platforms were of two different versions. To ensure 2.4. Biological interpretation of the CD4T gene
comparability between the datasets, each population’s signature
expression values were aggregated by unweighted mean Separate Gene Ontology: Biological Processes no-redundant
across all available transcript variants of a common gene analyses (WebGestalt implementation, www.webgestalt.
so that there would be only one unique gene symbol. Next, org) were performed on the gene sets with positive and
11
Volume 7 Issue 3 (2024) 2 doi: 10.36922/itps.2761

