Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 23;15(1):10164.
doi: 10.1038/s41467-024-54434-4.

Proteogenomic analysis reveals non-small cell lung cancer subtypes predicting chromosome instability, and tumor microenvironment

Affiliations

Proteogenomic analysis reveals non-small cell lung cancer subtypes predicting chromosome instability, and tumor microenvironment

Kyu Jin Song et al. Nat Commun. .

Abstract

Non-small cell lung cancer (NSCLC) is histologically classified into lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LSCC). However, some tumors are histologically ambiguous and other pathophysiological features or microenvironmental factors may be more prominent. Here we report integrative multiomics analyses using data for 229 patients from a Korean NSCLC cohort and 462 patients from previous multiomics studies. Histological examination reveals five molecular subtypes, one of which is a NSCLC subtype with PI3K-Akt pathway upregulation, showing a high proportion of metastasis and poor survival outcomes regardless of any specific NSCLC histology. Proliferative subtypes are present in LUAD and LSCC, which show strong associations with whole genome doubling (WGD) events. Comprehensive characterization of the immune microenvironment reveals various immune cell compositions and neoantigen loads across molecular subtypes, which predicting different prognoses. Immunological subtypes exhibit a hot tumor-enriched state and a higher efficacy of adjuvant therapy.

PubMed Disclaimer

Conflict of interest statement

Competing interests: Kwang Pyo Kim is the CEO of NioBiopharmaceuticals, Inc. Se Jin Jang is the chief technology officer of SG Medical, Inc. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Identification of multiomics subtypes in Korean NSCLC patients.
a Summary of clinical information for the Korean NSCLC patient cohort. Bar plots show NSCLC histology, tumor stages, pathologic N status, and tumor recurrence status with adjuvant treatment. b Numbers of quantified proteins and PTM sites identified in global proteomic, phosphoproteomic, and acetylproteomic analyses. The number of features quantified at <30% missing values across 229 samples is represented by the yellow line. c Overview of NMF clustering. Other tumor suppressor genes consisted of CDKN2A, STK11, KEAP1, RB1, PPP2R1A, and SMARCA4. Other oncogene alterations consisted of frameshift deletions; in-frame deletion/insertion and missense mutations in KRAS, ERBB2, and PIK3CA; exon skipping in MET; and gene fusion in ALK, ROS1, and RET. Copy number loss was defined as homozygous deletion (absolute copy number <0.5). d Enrichment of the five identified NMF clusters for clinical variables. Tests indicating statistical significance (P < 0.05, two-sided Fisher’s exact test) are colored according to the odds ratio (OR). Box size indicates the proportion of the cohort characterized by a given clinical variable. e Pathway enrichment analyses of the five subtypes using GSVA and PTM-SEA. Pathways with statistical significance (FDR < 0.05, permutation) and positive enrichment scores (z-score) are represented by dots.
Fig. 2
Fig. 2. Novel NSCLC subtype associated with poor survival.
a Overlap of subtype features between the five NMF subtypes in this study and subtypes identified in previous NSCLC multiomics studies. Protein enrichment is in the heatmap. Full rectangle and asterisk indicate significant overlaps (Two-sided fisher’s exact test adjusted P ≤ 0.05, Benjamini-Hochberg adjustment), faint rectangle indicates overlaps which pass only nominal P value (Two-sided fisher’s exact test P ≤ 0.05, two-sided fisher’s exact test adjusted P > 0.05), and blank indicates overlaps which is not significant (two-sided fisher’s exact test P > 0.05). b, c Reclassification of samples from previously defined multiomics subtypes, according to our combined NMF subtypes. The statistical significance of the relationship is visually represented by the clarity and transparency of the lines (Supplementary Fig. 2c). d Subtype 4-specific kinase activity scores estimated from phosphoproteomic data and the kinase-substrate network database (PHONEMeS). The colors of the points represent the estimated kinase activity scores. The sizes of the points represent the -log10(FDR) of the kinase activity estimates. There were two significantly upregulated kinases: CSNK2A1 and GSK3B (FDR < 0.05). e Expression of poor prognosis markers containing phosphorylated sites on SLK (S347) and LRRFIP1 (S581) is shown for our study (Subtype 4, n = 43; others, n = 186) and CPTAC (LUAD (Subtype 4, n = 26; others, n = 84), LSCC (Subtype 4, n = 15; others, n = 93)). Wilcoxon rank-sum test was performed to test the differences in expression. The color of the dots in the right panel represents the study type in CPTAC. For box-plots, middle line, median: box edges, 25th and 75th percentiles; whiskers, most extreme points that do not exceed ±1.5 × IQR. f Cancer-specific overall survival length according to the expression of poor prognosis markers in our study and the CPTAC dataset (integrated with LUAD and LSCC). The p-value was calculated with the log-rank test. g Intracellular signaling pathways underlying poor prognosis in Subtype 4. The blue box represents the main signaling pathways, including the HIF-1, VEGF, PI3K-AKT, and NF-κB signaling pathways. The red triangular nodes are kinases identified as significantly upregulated in Subtype 4. The colors of the points represent the log2FC values obtained through differential expression (DE) analysis of Subtype 4 and the other subtypes. The border style of the point indicates the prognostic direction of the feature. h Cancer-specific overall survival length between our subtypes indicating significant changes in survival probability (y-axis) over time (x-axis). i Survival curves for patients with (n = 35) and without metastasis (n = 8) in Subtype 4 (n = 43) and ( j) patients without metastasis in each subtype (n = 91). The p-value was calculated with the log-rank test (h–j).
Fig. 3
Fig. 3. Landscape of cell type-specific subtype characteristics.
a UMAP plot of single-cell types specific to Subtypes 1 to 5, using each DEGsubtype. The color of each point represents the module score of each cell; the more relevant the module is to the cell type, the higher the score and the redder the color. UMAP information was obtained from the original study. b Representative histologic images of the subtypes. The tumor cell (T) and stromal (S) components are separately labeled. Note the irregularly fused tumor glands in subtype 1 tumor compared to those in subtype 2 tumors composed of small, uniform tumor cells lying within the elastic stroma similar to the normal alveolar wall. Dense stromal inflammatory cell infiltration in Subtype 5 tumors. Scale bars for b = 100 μm. c Proportions of samples with different pathologic diagnoses within each subtype. LUAD was predominant in Subtypes 1 and 2, whereas LSCC was predominant in Subtype 3. df Histologic patterns of LUADs in each subtype. The predominant patterns of Subtype 1 and 2 LUADs were most commonly acinar or papillary but were quite heterogeneous (d). The proportion of the lepidic pattern, considered to indicate noninvasive LUAD, was enriched mostly in Subtype 2 LUADs, suggesting that Subtype 2 is most like early LUAD (Subtype 1, n = 55; Subtype 2, n = 34, Subtype 3, n = 10; Subtype 4, n = 26; Subtype 5, n = 20) (e). Consistently, the proportion of high-grade histologic patterns (including solid, micropapillary, cribriform, and complex glandular patterns) was lowest in Subtype 2 LUADs. The high-grade histologic pattern was more extensive in Subtype 1 than in Subtype 2, but these subtypes were remarkably heterogeneous compared to Subtype 3–5 LUADs, which were mostly composed of high-grade histologic patterns (Subtype 1, n = 55; Subtype 2, n = 34, Subtype 3, n = 10; Subtype 4, n = 24; Subtype 5, n = 20) (f). For box-plots, middle line, median: box edges, 25th and 75th percentiles; whiskers, most extreme points that do not exceed ±1.5 × IQR. gI Lymphovascular invasion (g), lymph node metastasis (h), and tumor necrosis (i) were less common in Subtype 2 tumors, which also implies that Subtype 2 tumors are in a clinically early, nonprogressed stage. j Microscopically, the stromal component was more extensive in Subtype 2, 4, and 5 tumors (Subtype 1, n = 55; Subtype 2, n = 43, Subtype 3, n = 52; Subtype 4, n = 43; Subtype 5, n = 34). For box-plots, middle line, median: box edges, 25th and 75th percentiles; whiskers, most extreme points that do not exceed ±1.5 × IQR. kTumor-infiltrating lymphocytes (k) and stromal neutrophilic infiltration (l) were most extensive in Subtype 5 tumors. The p-value was calculated using the chi-square test (c, d, gI, k, and l) the Kruskal-Wallis test (e, f, and j). m Summary of the histopathologic characteristics of the NSCLC subtypes.
Fig. 4
Fig. 4. Proteogenomic features underlying whole-genome doubling (WGD) in NSCLC subtypes.
a Barplot showing WGD fraction in each multiomics subtype from our study and CPTAC NSCLC patients. b Overlap of copy number signatures for the five multiomics subtypes, with the colors indicating the odds ratio from one-sided Fisher’s exact test. The COSMIC v3 signature and etiology for each signature are indicated on the y-axis. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001 (one-sided Fisher’s exact test). c Top 10 most significantly enriched copy number variations (CNVs) in Subtype 3 (FDR < 0.1). The y-axis indicates the cytoband and the x-axis shows the log10-scaled FDR from linear regressions comparing Subtype 3 and other samples. d Enriched co-mutations in each subtype in both our and the CPTAC cohorts. EGFR mutation, missense mutation, in-frame deletion, frameshift deletion, and amplifications were counted. For TP53 and CDKN2A, amplifications were excluded, and for SOX2, only amplifications were counted. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001 (one-sided Fisher’s exact test). e Protein-level gene set enrichment analysis (GSEA) revealing upregulated and downregulated pathways in Subtype 3 of both cohorts. The x- and y-axis are enrichment scores (ES) from the current study and CPTAC NSCLC data, respectively. Labeled pathways are the top-5 upregulated pathways in Subtype 3. The Molecular Signatures Database (MSigDB) hallmark gene set v7.4 was used for GSEA. f Subtype 3-specific kinase activity scores. The sizes of the points indicate -log10(FDR) from kinase activity estimation. Significantly up- and downregulated kinases are labeled (FDR < 0.05). g Elevated protein expression of XPO1 in Subtype 3 is shown for our study (Subtype 1, n = 55; Subtype 2, n = 45; Subtype 3, n = 52; Subtype 4, n = 43; Subtype 5, n = 34) and CPTAC LSCC (Subtype 1, n = 11; Subtype 2, n = 19; Subtype 3, n = 42; Subtype 4, n = 15; Subtype 5, n = 21). Kruskal-Wallis test was performed to test the differences in expression. WGD status is marked by red dots and the y-axis shows log2 protein expression levels. For box-plots, middle line, median; box edges, 25th and 75th percentiles; whiskers, most extreme points that do not exceed ± 1.5 × IQR. h Sample information (top), drug response curve (middle), and IC50 (bottom) for selinexor (XPO1 inhibitor) for lung organoids highlighting a higher sensitivity in WGD-positive LSCC organoids. Three technical replicates were tested in each organoid sample. For IC50 barplot, dots indicate each replicate, and error bars indicate average ± 1 standard deviation. i WGD-related pathway underlying Subtype 1 LUAD tumors and Subtype 3 LSCC tumors. Significantly upregulated kinases are highlighted with red triangles (FDR < 0.05) and mutations are shown in purple boxes. Kinase activity scores are estimated from phosphoprotein expression. The log2 fold changes from DE analyses are indicated by the color in each box. For the phosphoproteome, only features with FDR less than 0.1 are displayed.
Fig. 5
Fig. 5. Landscapes of immune clusters and cell types across NSCLC subtypes and cohorts.
a Immune subtyping based on cell type and pathway enrichment scores. Cell type-based clustering was performed with 205 tumor and 85 normal adjacent to the tumor (NAT) samples, and pathway-based clustering was performed using only tumor samples. The tumor-infiltrating lymphocyte (TIL) pattern, clinical histology (diagnostics, DX), multiomics subtype, tumor stage, and tissue information are described. IC, immune cluster. b (top) DEGHTE and DEGCTE were used to generate the UMAP plot of scRNA-seq data. A two-sided t-test was conducted to assess the statistical significance of the differences in gene expression. The color of each point represents the module score of each cell; higher scores are shown in red. UMAP information was obtained from multiple NSCLC studies. (bottom) The correlations of 10 cell types and immune clusters with the pattern of TILs were analyzed. The sizes and colors of the circles indicate the statistical significance and correlation coefficient of the correlations, respectively. The horizontal black dotted line indicates P = 0.05. c Hazard ratios for overall survival (OS, left) and relapse-free survival (RFS, right) related to various cell types in the cell type-based immune cluster. A hazard ratio lower than zero (blue box) indicates that the hot-tumor-enriched (HTE) status or a high cell type score was associated with prolonged survival. Error bars (gray lines) represent mean ± 95% confidence interval (CI). Red text indicates statistical significance in the survival analysis by the log-rank Mantel‒Cox test (n = 174). d Correlations of the RNA expression, protein expression, and protein activity of 10 immunomodulators with immune cluster status for our cohort as well as other lung cancer multiomics cohorts,. Correlation coefficients and p-values were obtained from a generalized linear model (GLM). e Correlations between the expression or activity of immunomodulators and the status of driver mutations in our cohort and the Satpathy and Gillette cohorts. The top associations between immunomodulators and known driver genes are described. f The left boxplot shows the RNA and protein expression of SLAMF7 in samples (n = 205) with wild-type or mutant SMARCA4 (n = 205); the right boxplot shows the RNA and protein expression of SLAMF7 in HTE and cold-tumor-enriched (CTE) samples (n = 174). The two-sided t-test was performed to test the differences in expression. The box represents the 25th and 75th percentiles, the central mark denotes the median, and the whiskers extend to the most extreme points within ±1.5 × IQR. g (left) Box (top) and balloon (bottom) plots showing the mean expression of marker genes of HTE and 10 cell types across the multiomics subtypes (n = 174). The marker genes were defined as the top−300 and −30 most overexpressed genes in HTE samples and highly cell type-enriched samples, respectively. (right) The bar (top) and box (middle and bottom) plots show the mutation frequency of SMARCA4 and RNA/protein expression of SLAMF7 across multiomics subtypes (n = 174), respectively. The Kruskal‒Wallis test was performed to assess the differences among the multiomics subtypes. The box represents the 25th and 75th percentiles, the central mark denotes the median, and the whiskers extend to the most extreme points within ±1.5 × IQR.
Fig. 6
Fig. 6. Clinical relevance of neoantigens and cryptic MAPs, and their associations with multiomics subtypes.
a Survival estimated according to the type of neoantigen and cryptic MAP. A lower hazard ratio (blue box) indicates that a high load of neoantigens or cryptic MAPs is associated with prolonged overall survival, and a high hazard ratio (red box) indicates the opposite. Hazard ratios for individual trials and overall effects are given with 95% CIs. Log (HR) values and their corresponding 95% confidence intervals (CIs) are depicted in grey. b Kaplan‒Meier curve showing the survival of two groups of patients (n = 204) according to whether they did (blue line) or did not have (red line) recurrent cryptic MAPs. The p-value was derived by comparing the curves with the log-rank Mantel‒Cox test. c Correlations between the number of cryptic MAPs and enrichment scores of 10 cell types and the immune cluster. Correlation coefficients were calculated by a linear regression model with the covariates of sample batches and histological diagnosis. The size of the dots indicates the degree of the -log10-scaled p-value, and the color of the dots represents the strength of the correlation coefficient. The bold-lined dots indicate statistical significance. d Kaplan‒Meier curve showing the survival patterns of four groups of patients (n = 174) stratified by cryptic MAP load and immune cluster. The p-value was obtained by comparing curves between the two groups with the largest difference in the log-rank Mantel‒Cox test. e Enrichment analysis of the four groups described in Fig. 4d for the multiomics subtypes. The x- and y-axis indicate enrichment and statistical significance calculated using a two-sided Fisher’s exact test with the Benjamini‒Hochberg adjustment, respectively. The size of each dot indicates the level of significance. f Features of patients with multiomics Subtype 5 disease who had a low cryptic MAP load with an HTE status, activated APM, and activated NF-κB pathway. g Kaplan–Meier curve of recurrence-free survival according to treatment status (chemotherapy [CTx] or chemoradiation therapy [CRTx]) in patients categorized by multiomics subtype. The p-value was derived by comparing the curves with the log-rank Mantel‒Cox test.

Similar articles

References

    1. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.71, 209–249 (2021). - PubMed
    1. Arriagada, R. et al. Long-term results of the international adjuvant lung cancer trial evaluating adjuvant Cisplatin-based chemotherapy in resected lung cancer. J. Clin. Oncol.28, 35–42 (2010). - PubMed
    1. Yang, C. Y., Yang, J. C. & Yang, P. C. Precision management of advanced non-small cell lung cancer. Annu Rev. Med71, 117–136 (2020). - PubMed
    1. Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer Statistics, 2021. CA Cancer J. Clin.71, 7–33 (2021). - PubMed
    1. Submission., N. C. I. S. S. D. N. https://seer.cancer.gov/data-software/documentation/seerstat/nov2020/.

MeSH terms

Substances