Developing a prognostic model using machine learning for disulfidptosis related lncRNA in lung adenocarcinoma … – Nature.com

Identification of prognostically relevant DRLs and construction of prognostic models

In our investigation of the LUAD landscape, we analyzed 16,882 lncRNAs derived from the TCGA-LUAD database. This comprehensive evaluation led to the identification of 708 DRLs, which demonstrate significant interactions with DRGs, as depicted in a sankey diagram (Fig.2A). Through further analysis incorporating data from three GEO databases, we narrowed these DRLs down to 199 lncRNAs consistently present across datasets, suggesting a pivotal role in LUAD pathogenesis (Fig.2B). Our prognostic assessment using univariate cox regression analysis revealed 37 lncRNAs with significant implications for LUAD patient outcomes (Fig.2C). Leveraging these lncRNAs, we constructed a predictive model employing an ensemble of machine learning techniques, with the ensemble model (Supplementary Table 2) achieving a notably high C-index of 0.677[95% confidence interval (CI) 0.63 to 0.73], suggesting robust predictive performance (Fig.2D). This model's effectiveness was further validated through a risk stratification system, categorizing patients into high and low-risk groups based on their lncRNA expression profiles. This stratification was substantiated by principal component analysis (PCA), which confirmed the distinct separation between the risk groups, underscoring the potential of our model in clinical risk assessment (Fig.2E).

Construction of prognostic model composed of 27 DRLs. (A) Sankey diagram illustrating the relationship between DRGs and associated lncRNAs. (B) The intersection of DRLs sourced from the TCGA database and GEO database. (C) 27 lncRNAs after univariate Cox regression. (D) 101 prediction models evaluated, with C-index calculated for each across all validation datasets. (E) Principal Component Analysis of the low-risk and high-risk cohorts based on 27 DRLs.

Our survival analysis using the TCGA-LUAD dataset revealed a significant distinction in OS between the high- and low-risk groups identified through our model (p<0.001, log-rank test) (Fig.3A). This finding was consistently replicated across three independent GEO datasets, demonstrating significant differences in both OS (GSE31210, p=0.001; GSE30219, p=0.019; GSE50081, p=0.025) (Fig.3BD) and DFS (GSE31210, p<0.001; GSE30219, p=0.009; GSE50081, p=0.023) (Supplementary Fig. S1AC). The predictive power of the risk score was superior to that of traditional prognostic factors such as age, gender, and staging, as evidenced by the C-index comparison (Supplementary Fig. S1D). The risk score also emerged as an independent prognostic indicator in our univariate and multivariate cox analyses (p<0.001) (Supplementary Table 3). Multicollinearity within the model was assessed using the variance inflation factor, which was below 10 for all variables (Supplementary Table 4). The AUC analysis further validated the robustness of our model, with one-year, two-year, and three-year AUCs of 0.76, 0.72, and 0.74, respectively, in the TCGA-LUAD dataset (Fig.3F). The external validation using GEO datasets underscored the model's accuracy, particularly notable in GSE30219, GSE50081 and GSE31210 for the evaluated intervals (Fig.3G,I).

Efficacy of the DRLs Survival Prognostic Risk Model. KaplanMeier (KM) analysis for high-risk and low-risk groups are exhibited in (A) TCGA-LUAD, (B) GSE31210, (C) GSE30219 and (D)GSE50081. (E) KaplanMeier (KM) survival curves for mutant and non-mutant groups. Analysis of 1-, 2-, and 3-year ROC curves for (F) TCGA-LUAD, (G) GSE30219, (H) GSE50081, and (I) GSE31210.

Further analysis showed gender-specific differences in risk scores across various pathological stages. In early stages (I and II), men exhibited significantly higher risk scores compared to women (Stage I: p=0.015; Stage II: p=0.006; Wilcoxon test) (Supplementary Fig. S2A,B). However, these differences were not observed in later stages (III/IV) (p=0.900, Wilcoxon test) (Supplementary Fig. S2C), suggesting stage-specific risk dynamics. In addition, our study uncovered notable disparities in risk scores among patients with mutations in EGFR, ALK, and KRAS genes in the GSE31210 dataset (p<0.001, KruskalWallis test) (Supplementary Fig. S2D). Patients harboring these mutations also exhibited better OS compared to those without (p=0.018, log-rank test) (Fig.3E), highlighting the potential prognostic relevance of genetic profiles in LUAD. The impact of smoking, a known risk factor for LUAD, was evident as significant differences in risk scores between smokers and non-smokers were observed in analyses of the GSE30210 and GSE50081 datasets (GSE31210, p=0.003; GSE50081, p=0.027; Wilcoxon test) (Supplementary Fig. S2E,F).

To enhance our model's utility in clinical decision-making, we developed a nomogram that incorporates the identified risk scores alongside essential clinical parametersage, gender, and TNM staging. This integration aims to provide a more comprehensive tool for predicting the prognosis of LUAD patients (Fig.4A). We rigorously validated the nomogram's predictive accuracy using calibration curves, which compare the predicted survival probabilities against the observed outcomes. The results demonstrated a high degree of concordance, indicating that our nomogram accurately reflects patient survival rates (Fig.4B). Further assessment through DCA (Fig.4C-E) confirmed that the nomogram provides substantial clinical benefit. Notably, the analysis showed that the nomogram significantly outperforms the predictive capabilities of the risk score alone, particularly in terms of net benefit across a wide range of threshold probabilities.

Development of a Nomogram for Risk Prediction & Analysis of Mutation Patterns in Both Risk Groups. (A) Nomogram that combines model and clinicopathological factors. (B) Calibration curves in 1-, 3-, and 5-year for the nomogram. (CE) The decision curves analysis (DCA) of the nomogram and clinical characteristics in 1-, 3-, and 5-year. (F) TMB levels between the high-risk and low-risk groups. (G) Gene mutation waterfall chart of the low-risk group. (H) Gene mutation waterfall chart of the high-risk group.

A marked difference in TMB was discerned between the high- and low-risk cohorts (p<0.001 by wilcoxon test) (Fig.4F). The waterfall plot delineates the mutational landscape of the ten most prevalent genes across both risk strata. In the low-risk cohort, approximately 84.53% of specimens exhibited gene mutations (Fig.4G), whereas in the high-risk stratum, mutations were observed in roughly 95.33% of specimens (Fig.4H). Predominant mutations within the high-risk category included TP53, TTN, and CSMD3.

The differential expression analysis revealed a total of 1474 DEGs between the low-risk and high-risk cohorts. Among these, 568 genes were upregulated and 906 genes were downregulated. The volcano plot (Supplementary Fig. S2G) illustrates the distribution of these DEGs. These results indicate that specific genes are significantly associated with risk stratification in our study cohort. In the GO analysis (Fig.5A,D), DEGs showed predominant enrichment in terms of molecular functions such as organic anion transport, carboxylic acid transport. Regarding cellular components, the main enrichment was observed in the apical plasma membrane (Fig.5C). Figure5E demonstrates the GSEA results, highlighting significant enrichment of specific gene sets related to metabolic processes, DNA binding, and hyperkeratosis. The KEGG result highlighted a significant enrichment of DEGs in neuroactive ligand-receptor interaction and the cAMP signaling pathway (Fig.5B).

Biological function analysis of the DRLs risk score model. The top 5 significant terms of (A) GO function enrichment and (B) KEGG function enrichment. (C,D) System clustering dendrogram of cellular components. (E) Gene set enrichment analysis.

To validate the precision of our results, we employed seven techniques: CIBERSORT, EPIC, MCP-counter, xCell, TIMER, quanTIseq, and ssGSEA, to assess immune cell penetration in both high-risk and low-risk categories (Fig.6A). With the ssGSEA data, we explored the connection between TME and several characteristics of lung adenocarcinoma patients, such as age, gender, and disease stage (Fig.6B). We then visualized this data with box plots for both CIBERSORT and ssGSEA (Fig.6C,D). These plots showed that the infiltration levels of B cells memory, T cells CD4 memory resting, and Monocyte was notably lower in the high-risk group compared to the low-risk group. With the help of the ESTIMATE algorithm, we evaluated the stromal (Fig.6F), immune (Fig.6E), and ESTIMATE scores (Supplementary Fig. S3A) across the different risk groups. This allowed us to gauge tumor purity. Our study suggests that the high-risk group has reduced stromal, ESTIMATE, and immune scores. Conversely, the score of tumor purity in the low-risk group is less than that in the high-risk group (Supplementary Fig. S3B).

The tumor microenvironment between high-risk and low-risk groups based on DRLs. (A) Comparing the levels of immune cell infiltration for different immune cell types in the CIBERSORT, EPIC, MCP-counter, xCell, TIMER and quanTIseq algorithm for low-risk and high-risk groups. (B) Immune infiltration of different lung adenocarcinoma patient characteristics. Box plot of the difference in immune cell infiltration between the high-risk and low-risk score groups based on (C) CIBERSORT and (D) ssGSEA. *p-value<0.05, **p-value<0.01, ***p-value<0.001, ns=no significance. (E) Immune score, and (F)stromal score were lower in the high-risk group than in the low-risk group.

We calculated the TIDE score and forecasted the immunotherapy response in both groups of the high risk and low risk (Fig.7A). Based on results from both datasets, patients in low-risk group seem more inclined to show a positive reaction to immunotherapy. Additionally, IPS for the combination of anti-CTLA4 and anti-PDL1 treatment, as well as for anti-CTLA4 alone, was consistently higher in the low-risk group (Fig.7B,C). However, the analysis of anti-PDL1 treatment alone (P=0.170) did not reach statistical significance (Fig.7D). This suggests that low-risk patients may respond better to anti-CTLA4 and/or anti-PDL1 immunotherapy. Recently, research has found a link between tumor TLS and outcomes in several tumor types. In line with these discoveries, our review of TCGA-LUAD dataset showed that LUAD patients with high TLS scores had more favorable outcomes than those with low scores (Fig.7F). We also noticed that the TLS score was higher in the low-risk group compared to the high-risk group (Fig.7E).

Immunotherapeutic sensitivity between high-risk and low-risk groups based on DRLs. (A) Differences in risk scores between the TIDE responsive and nonresponsive groups. (BD) Sensitivity of high- and low-risk groups to combination therapy, anti-CTLA4, and anti-PDL1 by different IPS scores. (E) Differences in tumor tertiary lymphoid structure (TLS) scores in high-risk and low-risk groups in TCGA-LUAD. (F) KM analysis of high-TLS and low-TLS groups.

In our assessment of the relationship between risk scores and sensitivity to chemotherapy, we measured the IC50 for some widely used chemotherapeutic medicine. Our findings showed that the high-risk group was more sensitive to drugs like Cisplatin, Vinblastine, Cytarabine, Vinorelbine, Bexarotene, Cetuximab, Docetaxel, and Doxorubicin than the low-risk group (Fig.8AP).

Immunotherapy sensitivity analysis and in-depth study of LINC00857. (AP) Differences in drug sensitivity between high-risk and low-risk groups. (Q) Volcano plot for GTEX_Lung vs. TCGA_Lung_ Adenocarcinoma.

Through differential gene analysis of tumor tissues and normal tissues, 13,995 DEGs (|logFC|>1.5, p-value<0.050) (Fig.8Q, Supplementary Fig. S3C) were identificated. By cross-referencing with the 27 lncRNAs that form our prognostic model, we pinpointed LINC01003. Supplementary Fig. S4A presents a heatmap demonstrating the expression levels of LINC01003 across different NSCLC datasets and cell types. The results indicate that LINC01003 is differentially expressed, with notable high expression in monocytes/macrophages and endothelial cells across several datasets, suggesting its potential involvement in these cell types within the NSCLC tumor microenvironment. Supplementary Figure S4B further illustrates the expression profile of LINC01003 in different cell populations from the GSE143423 dataset. The violin plot shows significant expression of LINC01003 in malignant cells, compared to other cell types, indicating its potential role in tumor progression.

To decipher the LINC00857 related regulatory mechanisms, we constructed a lncRNA-miRNA-mRNA network (Supplementary Fig. S4C). This network illustrates the intricate interactions between LINC00857 and various miRNAs and mRNAs. In this network, LINC00857 acts as a central regulatory hub, potentially influencing gene expression by sequestering multiple miRNAs, such as hsa-miR-4709-5p, hsa-miR-760, and hsa-miR-340-5p. These miRNAs, in turn, are connected to a wide array of target genes, including YWHAZ, BCL2L2, PTEN, and MYC, which are critical in cellular processes such as cell cycle regulation, apoptosis, and signal transduction.

See the original post:
Developing a prognostic model using machine learning for disulfidptosis related lncRNA in lung adenocarcinoma ... - Nature.com

Related Posts

Comments are closed.