Page 270 - EJMO-9-3
P. 270
Eurasian Journal of
Medicine and Oncology WGCNA and LASSO for osteoporosis biomarkers
1. Introduction high-dimensional omics data and identify robust disease-
associated signatures. In particular, LASSO regression has
17
Osteoporosis (OP) is a systemic metabolic bone disorder demonstrated scientific reliability in screening biomarkers
characterized by decreased bone mineral density (BMD) for complex diseases, including oncology, cardiovascular
and microarchitectural deterioration, predisposing disorders, and, more recently, bone metabolic diseases. 18,19
patients to fragility fractures at high-risk sites including In the present study, we aim to combine WGCNA
the hip, spine, and wrist. These fractures frequently lead and LASSO regression to systematically analyze gene
1
to chronic pain, functional impairment, and irreversible expression data related to OP, identifying and validating
skeletal damage, collectively contributing to reduced diagnostic biomarkers and potential drug targets. Through
quality of life. OP currently affects approximately this research, we seek to provide novel scientific insights
2-4
200 million individuals globally, with a prevalence of and references for the clinical diagnosis and treatment of
5
30% in women and 20% in men aged over 50 years. OP. The analysis flowchart is shown in Figure 1.
Furthermore, demographic aging is projected to escalate
the annual healthcare costs of osteoporotic fractures to 2. Materials and methods
hundreds of billions of USD by 2050. Together, these
6
pathophysiological and epidemiological features establish 2.1. Sample sources
OP as a major global public health challenge requiring Transcriptome profiles were retrieved from the NCBI
urgent intervention. Gene Expression Omnibus (GEO) repository via the
The pathogenesis of OP is multifactorial, with immune GEOquery R package (v2.66.0). Two datasets generated
and inflammatory responses playing pivotal roles in on the Affymetrix Human Genome U133 Plus 2.0 Array
5
disrupting bone homeostasis. Chronic inflammatory (GPL570) were selected to serve as discovery and validation
states activate the nuclear factor kappa B (NF-κB) signaling cohorts, respectively: GSE35958, which included four
pathway and promote the release of pro-inflammatory healthy controls and five OP cases, and GSE35956, used as
cytokines such as interleukin-6 (IL-6) and tumor necrosis the validation set with five controls and five cases.
factor-alpha (TNF-α), which enhance osteoclast activity Raw CEL files were downloaded and subjected to
and suppress osteoblast function, thereby accelerating bone stringent quality control using the affyQC-Report R
loss. In addition, genetic predisposition, hormonal changes, package (v1.72.0), with no samples excluded on the basis
6,7
nutritional status, and lifestyle factors contribute significantly of normalized unscaled standard errors (>1.05) or relative
to the development of OP. 8-10 In recent years, therapeutic log expression dispersion criteria. Probe-level data were
strategies for OP have primarily included bisphosphonates, background-corrected, quantile-normalized, and log2-
selective estrogen receptor modulators (SERMs), and transformed using the affy and “limma” packages. Non-
monoclonal antibodies. While these agents effectively inhibit specific filtering was applied to retain only probes with an
bone resorption, their long-term use may be associated with average intensity ≥4 in at least 20% of arrays. Probe sets
adverse effects and fails to fully restore bone microstructure were collapsed to unique Entrez gene symbols using the
and function. 11,12 Moreover, the absence of definitive annotate package (max mean probe per gene). The two
diagnostic criteria and targeted therapies complicates early datasets were processed independently to preserve their
intervention and personalized treatment. Therefore, mutual independence for downstream machine learning
13
identifying diagnostic biomarkers holds substantial clinical modeling and subsequent enrichment analyses.
significance for early screening, risk prediction, and the
discovery of therapeutic targets in OP. 14 2.2. Differential analysis
Bioinformatics approaches have become instrumental To identify differentially expressed genes (DEGs)
in elucidating core disease mechanisms. Weighted Gene associated with OP, we conducted a comprehensive
Co-expression Network Analysis (WGCNA) enables analysis of the GSE35958 and GSE35956 datasets utilizing
the systematic identification of disease-associated the “limma” package in R. The criteria for selecting
modules and hub genes from high-throughput data, with DEGs were established based on a stringent threshold
demonstrated applications in OP and other metabolic of |log fold change (FC)| > 0.5 and a significance level of
disorders. 15,16 As a classical machine learning method, p<0.05. Specifically, genes with log FC > 0.5 and p<0.05
Least Absolute Shrinkage and Selection Operator (LASSO) were classified as upregulated, whereas those with log FC
regression employs L1 regularization to enhance model < −0.5 and p<0.05 were categorized as downregulated.
interpretability by selecting the most predictive features For visualization, we generated heatmaps using the
while reducing overfitting. This approach has been widely “pheatmap” package, and volcano plots were constructed
17
adopted in biomarker discovery due to its ability to handle using “ggplot2” to illustrate the distribution of DEGs.
Volume 9 Issue 3 (2025) 262 doi: 10.36922/EJMO025240252

