Page 272 - EJMO-9-3
P. 272

Eurasian Journal of
            Medicine and Oncology                                         WGCNA and LASSO for osteoporosis biomarkers



            2.4. Enrichment analysis                           package (v1.18.4). The area under the ROC curve (area

            To systematically dissect the biological relevance of the   under the curve [AUC]) and the corresponding 95 %
            identified gene lists, we conducted gene ontology (GO)   confidence intervals were computed by DeLong’s method.
            and Kyoto Encyclopedia of Genes and Genomes (KEGG)   An AUC >0.85 was considered indicative of excellent
            pathway enrichment analyses. Gene symbols were first   predictive accuracy.
            converted to Entrez IDs using the org.Hs.eg.db annotation   2.6. Gene Set Enrichment Analysis (GSEA)
            package (v3.17.0). The clusterProfiler package (v4.8.1) was
            then used to query three GO domains—biological process   GSEA was performed to systematically explore the
            (BP), molecular function, and cellular component (CC)—  potential BPs and signaling pathways associated with the
            as well as the KEGG PATHWAY database (release 104.0).   hub gene of interest. The entire transcriptome was first
            Enrichment analysis was performed using the enrichGO   ranked according to the Pearson’s correlation coefficient
            and enrichKEGG functions, with a p-value cutoff of 0.05   between the hub gene and all other expressed genes. Next,
            after Benjamini–Hochberg FDR correction, and gene sets   singlegene GSEA was conducted with the “GSEA” function
            restricted to sizes between 10 and 500 genes. Redundant   implemented in the clusterProfiler R package against the
            GO terms were collapsed with the simplify function   Molecular Signatures Database (MSigDB, v7.x) “c2.cp.kegg”
            (q-value cutoff = 0.05). To quantify the enrichment   and “c5.go” collections.  To ensure  statistical robustness,
            strength, normalized enrichment scores (NES) and gene   10,000 phenotype-based permutations were executed, and
            ratios were extracted for each term.               gene set sizes were restricted to between 10 and 200 genes.
                                                               Pathways with a (|NES|) ≥ 1 and an adjusted p-value (FDR)
              The  resulting  GO  and  KEGG  terms  were  visualized   <0.05 were considered significantly enriched. The leading-
            using the “ggplot2” (v3.4.2) and “enrichplot” (v1.18.3)   edge genes driving each enriched gene set were extracted,
            packages. Bubble plots and cnet plots were generated to   and the results were visualized using the “gseaplot2”
            depict the top 20 most significantly enriched terms, ranked   function from the “enrichplot” R package.
            by adjusted p-value.
                                                               2.7. Immune infiltration analysis
            2.5. Screening and validation of key genes through
            machine learning                                   To quantify the landscape of tumor-infiltrating immune
                                                               cells, we applied “CIBERSORT” R package calculation to
            To identify the minimal set of genes that robustly   perform de novo enumeration of 22 human immune cell
            discriminates OP from healthy bone, we applied LASSO   subsets through a ν-support vector regression algorithm
            penalized logistic regression to the GSE35958 discovery   trained on the LM22 signature matrix, which consisted
            cohort (n = 9). Prior to modeling, the expression matrix   of 547 genes. The analysis was carried out using default
            was  variance-stabilized  using  the  voom  transformation   parameters: 1,000 permutations, quantile normalization
            and z-score normalized across samples. The optimal  λ   disabled,  and batch  correction activated.  Samples with
            value was determined by 10-fold cross-validation (CV)   a deconvolution  p<0.05 were retained for downstream
            with 1,000 repetitions, selecting the λ within one standard   analysis to ensure high fidelity of the inferred fractions.
            error of the minimum mean cross-validated deviance.
            Genes  retaining  non-zero  regression  coefficients at  the   Subsequently, the relative proportions of immune
            optimal  λ were designated as core genes. All LASSO   cell subsets, such as CD8⁺ T cells, M2 macrophages,
            analyses were implemented using the glmnet package   and regulatory T cells, were compared between normal
            (v4.1-8) in R (v4.3.0).                            and  OP  samples  using  a  two-sided  Wilcoxon  rank-sum
                                                               test with Benjamini–Hochberg correction  (FDR <  0.05).
              For internal validation, expression levels of the core   Compositional shifts were visualized using the “ggplot2” and
            genes in GSE35958 dataset were visualized using box   “pheatmap”  R  packages,  whereas  intercellular  correlation
            plots and dot plots generated with the “ggpubr” package   networks (|r| > 0.4 and  p<0.05) were constructed using
            (v0.6.0). For external validation, the same core gene set   the “igraph” package to explore potential immune cell
            was projected onto the independent GSE35956 dataset   crosstalk.
            (n = 10). Differential expression between OP and control
            groups  was  quantified  by  the  two-sided  Wilcoxon  rank-  2.8. Potential drug prediction
            sum test and corrected for multiple comparisons using the   The OP core genes (Entrez IDs) were imported into
            Benjamini–Hochberg (FDR < 0.05).                   Coremine Medical (http://www.coremine.com/medical),
              To assess the discriminatory capacity of the core gene   where the platform’s text-mining engine was queried under
            signatures, receiver operating characteristic (ROC) curves   the “herb–gene–disease” relationship layer to retrieve
            were constructed for both datasets using the “pROC”   botanical entities statistically linked to both the gene set


            Volume 9 Issue 3 (2025)                        264                         doi: 10.36922/EJMO025240252
   267   268   269   270   271   272   273   274   275   276   277