Page 272 - EJMO-9-3
P. 272
Eurasian Journal of
Medicine and Oncology WGCNA and LASSO for osteoporosis biomarkers
2.4. Enrichment analysis package (v1.18.4). The area under the ROC curve (area
To systematically dissect the biological relevance of the under the curve [AUC]) and the corresponding 95 %
identified gene lists, we conducted gene ontology (GO) confidence intervals were computed by DeLong’s method.
and Kyoto Encyclopedia of Genes and Genomes (KEGG) An AUC >0.85 was considered indicative of excellent
pathway enrichment analyses. Gene symbols were first predictive accuracy.
converted to Entrez IDs using the org.Hs.eg.db annotation 2.6. Gene Set Enrichment Analysis (GSEA)
package (v3.17.0). The clusterProfiler package (v4.8.1) was
then used to query three GO domains—biological process GSEA was performed to systematically explore the
(BP), molecular function, and cellular component (CC)— potential BPs and signaling pathways associated with the
as well as the KEGG PATHWAY database (release 104.0). hub gene of interest. The entire transcriptome was first
Enrichment analysis was performed using the enrichGO ranked according to the Pearson’s correlation coefficient
and enrichKEGG functions, with a p-value cutoff of 0.05 between the hub gene and all other expressed genes. Next,
after Benjamini–Hochberg FDR correction, and gene sets singlegene GSEA was conducted with the “GSEA” function
restricted to sizes between 10 and 500 genes. Redundant implemented in the clusterProfiler R package against the
GO terms were collapsed with the simplify function Molecular Signatures Database (MSigDB, v7.x) “c2.cp.kegg”
(q-value cutoff = 0.05). To quantify the enrichment and “c5.go” collections. To ensure statistical robustness,
strength, normalized enrichment scores (NES) and gene 10,000 phenotype-based permutations were executed, and
ratios were extracted for each term. gene set sizes were restricted to between 10 and 200 genes.
Pathways with a (|NES|) ≥ 1 and an adjusted p-value (FDR)
The resulting GO and KEGG terms were visualized <0.05 were considered significantly enriched. The leading-
using the “ggplot2” (v3.4.2) and “enrichplot” (v1.18.3) edge genes driving each enriched gene set were extracted,
packages. Bubble plots and cnet plots were generated to and the results were visualized using the “gseaplot2”
depict the top 20 most significantly enriched terms, ranked function from the “enrichplot” R package.
by adjusted p-value.
2.7. Immune infiltration analysis
2.5. Screening and validation of key genes through
machine learning To quantify the landscape of tumor-infiltrating immune
cells, we applied “CIBERSORT” R package calculation to
To identify the minimal set of genes that robustly perform de novo enumeration of 22 human immune cell
discriminates OP from healthy bone, we applied LASSO subsets through a ν-support vector regression algorithm
penalized logistic regression to the GSE35958 discovery trained on the LM22 signature matrix, which consisted
cohort (n = 9). Prior to modeling, the expression matrix of 547 genes. The analysis was carried out using default
was variance-stabilized using the voom transformation parameters: 1,000 permutations, quantile normalization
and z-score normalized across samples. The optimal λ disabled, and batch correction activated. Samples with
value was determined by 10-fold cross-validation (CV) a deconvolution p<0.05 were retained for downstream
with 1,000 repetitions, selecting the λ within one standard analysis to ensure high fidelity of the inferred fractions.
error of the minimum mean cross-validated deviance.
Genes retaining non-zero regression coefficients at the Subsequently, the relative proportions of immune
optimal λ were designated as core genes. All LASSO cell subsets, such as CD8⁺ T cells, M2 macrophages,
analyses were implemented using the glmnet package and regulatory T cells, were compared between normal
(v4.1-8) in R (v4.3.0). and OP samples using a two-sided Wilcoxon rank-sum
test with Benjamini–Hochberg correction (FDR < 0.05).
For internal validation, expression levels of the core Compositional shifts were visualized using the “ggplot2” and
genes in GSE35958 dataset were visualized using box “pheatmap” R packages, whereas intercellular correlation
plots and dot plots generated with the “ggpubr” package networks (|r| > 0.4 and p<0.05) were constructed using
(v0.6.0). For external validation, the same core gene set the “igraph” package to explore potential immune cell
was projected onto the independent GSE35956 dataset crosstalk.
(n = 10). Differential expression between OP and control
groups was quantified by the two-sided Wilcoxon rank- 2.8. Potential drug prediction
sum test and corrected for multiple comparisons using the The OP core genes (Entrez IDs) were imported into
Benjamini–Hochberg (FDR < 0.05). Coremine Medical (http://www.coremine.com/medical),
To assess the discriminatory capacity of the core gene where the platform’s text-mining engine was queried under
signatures, receiver operating characteristic (ROC) curves the “herb–gene–disease” relationship layer to retrieve
were constructed for both datasets using the “pROC” botanical entities statistically linked to both the gene set
Volume 9 Issue 3 (2025) 264 doi: 10.36922/EJMO025240252

