Page 270 - EJMO-9-3
P. 270

Eurasian Journal of
            Medicine and Oncology                                         WGCNA and LASSO for osteoporosis biomarkers



            1. Introduction                                    high-dimensional omics data and identify robust disease-
                                                               associated signatures.  In particular, LASSO regression has
                                                                                17
            Osteoporosis (OP) is a systemic metabolic bone disorder   demonstrated scientific reliability in screening biomarkers
            characterized by decreased bone mineral density (BMD)   for complex diseases, including oncology, cardiovascular
            and  microarchitectural  deterioration,  predisposing  disorders, and, more recently, bone metabolic diseases. 18,19
            patients to fragility fractures at high-risk sites including   In the present study, we aim to combine WGCNA
            the hip, spine, and wrist.  These fractures frequently lead   and  LASSO regression  to systematically  analyze  gene
                                1
            to chronic pain, functional impairment, and irreversible   expression data related to OP, identifying and validating
            skeletal damage, collectively contributing to reduced   diagnostic biomarkers and potential drug targets. Through
            quality of life.  OP currently affects approximately   this research, we seek to provide novel scientific insights
                         2-4
            200 million individuals globally, with a prevalence of   and references for the clinical diagnosis and treatment of
                                                          5
            30% in women and 20% in men aged over 50  years.    OP. The analysis flowchart is shown in Figure 1.
            Furthermore, demographic aging is projected to escalate
            the annual healthcare costs of osteoporotic fractures to   2. Materials and methods
            hundreds of billions of USD by 2050.  Together, these
                                            6
            pathophysiological and epidemiological features establish   2.1. Sample sources
            OP as a major global public health challenge requiring   Transcriptome profiles were retrieved from the NCBI
            urgent intervention.                               Gene Expression Omnibus (GEO) repository via the
              The pathogenesis of OP is multifactorial, with immune   GEOquery R package (v2.66.0). Two datasets generated
            and inflammatory responses playing pivotal roles in   on the Affymetrix Human Genome U133 Plus 2.0 Array
                                     5
            disrupting bone homeostasis.  Chronic inflammatory   (GPL570) were selected to serve as discovery and validation
            states activate the nuclear factor kappa B (NF-κB) signaling   cohorts, respectively: GSE35958, which included four
            pathway and promote the release of pro-inflammatory   healthy controls and five OP cases, and GSE35956, used as
            cytokines such as interleukin-6 (IL-6) and tumor necrosis   the validation set with five controls and five cases.
            factor-alpha (TNF-α), which enhance osteoclast activity   Raw CEL files were downloaded and subjected to
            and suppress osteoblast function, thereby accelerating bone   stringent quality control using the affyQC-Report R
            loss.  In addition, genetic predisposition, hormonal changes,   package (v1.72.0), with no samples excluded on the basis
               6,7
            nutritional status, and lifestyle factors contribute significantly   of normalized unscaled standard errors (>1.05) or relative
            to the development of OP.   8-10  In recent years, therapeutic   log expression dispersion criteria. Probe-level data were
            strategies for OP have primarily included bisphosphonates,   background-corrected,  quantile-normalized,  and  log2-
            selective estrogen receptor modulators (SERMs), and   transformed using the affy and “limma” packages. Non-
            monoclonal antibodies. While these agents effectively inhibit   specific filtering was applied to retain only probes with an
            bone resorption, their long-term use may be associated with   average intensity ≥4 in at least 20% of arrays. Probe sets
            adverse effects and fails to fully restore bone microstructure   were collapsed to unique Entrez gene symbols using the
            and function. 11,12  Moreover, the absence of definitive   annotate package (max mean probe per gene). The two
            diagnostic criteria and targeted therapies complicates early   datasets were processed independently to preserve their
            intervention and personalized treatment.  Therefore,   mutual independence for downstream machine learning
                                               13
            identifying diagnostic biomarkers holds substantial clinical   modeling and subsequent enrichment analyses.
            significance for early screening, risk prediction, and the
            discovery of therapeutic targets in OP. 14         2.2. Differential analysis
              Bioinformatics approaches have become instrumental   To identify differentially expressed genes (DEGs)
            in elucidating core disease mechanisms. Weighted Gene   associated with OP, we conducted a comprehensive
            Co-expression Network Analysis (WGCNA) enables     analysis of the GSE35958 and GSE35956 datasets utilizing
            the systematic identification of  disease-associated   the “limma” package in R. The criteria for selecting
            modules and hub genes from high-throughput data, with   DEGs were established based on a stringent threshold
            demonstrated applications in OP and other metabolic   of |log fold change (FC)| > 0.5 and a significance level of
            disorders. 15,16  As a classical machine learning method,   p<0.05. Specifically, genes with log FC > 0.5 and p<0.05
            Least Absolute Shrinkage and Selection Operator (LASSO)   were classified as upregulated, whereas those with log FC
            regression employs L1 regularization to enhance model   < −0.5 and  p<0.05 were categorized as downregulated.
            interpretability by selecting the most predictive features   For visualization, we generated heatmaps using the
            while reducing overfitting.  This approach has been widely   “pheatmap” package, and volcano plots were constructed
                                 17
            adopted in biomarker discovery due to its ability to handle   using “ggplot2” to illustrate the distribution of DEGs.

            Volume 9 Issue 3 (2025)                        262                         doi: 10.36922/EJMO025240252
   265   266   267   268   269   270   271   272   273   274   275