Page 96 - EJMO-9-3
P. 96
Eurasian Journal of
Medicine and Oncology Novel senescence-based melanoma risk model
to develop a risk model. Finally, the model’s predictive risk scores and patient survival was evaluated based on the
accuracy for patient survival was validated using the number of days lived since metastasis, providing further
validation set and an external Gene Expression Omnibus insights into the prognostic utility of the risk model in a
(GEO) dataset. clinically relevant setting.
2. Materials and methods 2.2. Identification of prognostic senescence-related
genes
2.1. Data acquisition and processing
Senescence-related genes were compiled from three gene
Bulk RNA-seq fragments per kilobase of transcript lists derived from previous studies that identified key genes
per million mapped reads data for SKCM and the associated with cellular senescence and their potential
corresponding clinical information were downloaded impact on various diseases, including cancer. 25-27 To
from the UCSC Xena database (https://xenabrowser. identify prognostic senescence-related genes, univariate
net/datapages/). After excluding samples lacking clinical
data, including stage, T stage, N stage, M stage, gender, Cox regression analysis was initially conducted, selecting
age, overall survival (OS), and OS time, 413 samples were 198 genes significantly correlated with survival outcome
retained. These samples were randomly allocated into a using a statistical significance threshold of p<0.05.
training set and a validation set at a 7:3 ratio. An additional Subsequently, multivariate Cox regression analysis was
validation dataset, GSE65904, comprising RNA microarray performed to refine this list by selecting 190 prognostic
data, was retrieved from GEO (https://www.ncbi.nlm.nih. senescence-related genes with p<0.05 while adjusting
gov/geo/). This dataset represents a population-based for the effects of each gene identified in the univariate
retrospective cohort consisting of 214 melanoma patients, analysis. This analysis was performed using the survival R
including 16 primary tumor samples and 188 metastatic package (https://CRAN.R-project.org/package=survival).
samples of various types. The male-to-female ratio is To visually represent the relationships between the
1.39. In addition, 84 patients harbor mutations in the selected prognostic genes and patient survival, a forest
B-Raf proto-oncogene, NRAS proto-oncogene (NRAS), plot was generated using the survminer R package (https://
neurofibromatosis Type I, and KIT proto-oncogene (KIT) CRAN.R-project.org/package=survminer).
genes. The dataset contains raw signal intensity values, 2.3. Subtype classification and survival analysis
ensuring high data fidelity for downstream analysis.
Moreover, it provides two distinct survival metrics, To uncover distinct molecular subtypes within the SKCM
including distant metastasis-free survival and disease- cohort, unsupervised clustering analysis was performed
specific survival (DSS), allowing for a comprehensive based on the expression profiles of the 190 prognostic
evaluation of the prognostic performance of the risk model senescence-related genes. This analysis was conducted
across different clinical endpoints. Probes with a detection using the ConsensusClusterPlus (https://bioconductor.
p<0.05 and present in more than 60% of the samples were org/packages/ConsensusClusterPlus/), a widely used tool
retained. The R package “lumi” (https://bioconductor. for robust and reproducible clustering of high-dimensional
org/packages/lumi/) was used to normalize the data and genomic data. The clustering algorithm employed the
convert raw Illumina probe intensities to expression values. hierarchical clustering (hc) method, which groups samples
Probe IDs were then mapped to their corresponding gene based on similarities in gene expression patterns, with
symbols according to the platform annotation. For genes the Pearson correlation coefficient serving as the metric
mapped by several probe IDs, the mean expression value to quantify pairwise similarity. The Pearson method
was calculated using “avereps” function from the “limma” was chosen for its sensitivity to both the magnitude and
R package (https://bioconductor.org/packages/limma/). direction of gene expression changes, making it particularly
To further validate the robustness and reliability of the suitable for capturing subtle yet biologically relevant
risk model, an additional external dataset, GSE19234, was differences in gene expression profiles. To determine the
utilized, containing raw signal intensity values in CEL optimal number of clusters (k), the cumulative distribution
format. This dataset includes 44 metastatic melanoma function (CDF) and its area under the curve (AUC) were
samples from patients who experienced at least two or systematically evaluated across a range of cluster numbers
three recurrences, with all samples from Stage III or higher. (k = 2 to 6). The CDF plot visualizes the stability of the
The male-to-female ratio is 1.75. To ensure consistency consensus matrix, with a flatter curve indicating higher
in data processing, the “rma” method within the “affy” R clustering stability, while the AUC quantifies the overall
package was utilized to normalize the raw expression data, agreement across multiple clustering runs. The optimal k
thereby minimizing technical variations and enhancing was selected at the point where the CDF curve began to
comparability across samples. The relationship between plateau, indicating minimal improvement in clustering
Volume 9 Issue 3 (2025) 88 doi: 10.36922/ejmo.8574

