clincancerres.aacrjournals.org · web viewpan-cancer molecular classes transcending tumor lineage...

34
Pan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary Figures and Description of Data Files

Upload: others

Post on 01-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Pan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases

Supplementary Figures and Description of Data Files

Page 2: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 1. Schematic of the various analyses performed in this study (with associated main and supplementary figures) and their relation to results provided in the supplementary data files.

Page 3: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 4: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 2. Multiplatform-based molecular classification of cancers representing 32 major types. (A) Integration of subtype classifications from five “omic” data platforms resulted in the identification of major groups/subtypes from 32 pathologically defined cancer types (n=10662 TCGA cancer cases with available data for at least three platforms). The first heat map displays the subtypes defined independently by chromosomal copy alteration (red), DNA methylation (orange), mRNA expression (green), microRNA expression (blue), and protein (RPPA) expression (cyan); each row in this heat map denotes membership within a specific subtype defined by the indicated platform. Cases were grouped using unsupervised hierarchical clustering of the platform-level subtype assignments, whereby the cases segregated largely according to cancer type as indicated. Pam50 subtype (BRCA cases only), expression of squamous markers (SOX2, TP63), adenocarcinoma versus squamous histology, and TP53 mutation status are indicated. DNA methylation patterns represent the top 2000 genomic loci with the highest variability in methylation (see Methods). Corresponding cancer types (denoted by TCGA project name) are indicated along the bottom. (B) K-means clustering was applied to the platform-level subtype assignment matrix from part A to define 25 subtypes. The overlap between each multiplatform-based molecular subtype and each of the 32 cancer types (as designated by TCGA project name) are represented using a colorgram. Most subtypes—13 out of 25—were specific to a single cancer type (with >70% of the cases for that cancer type assigned to the given subtype). Five subtypes were specific to more than one distinct cancer type (involving >70% of cases of each type), including a squamous subtype (k8, involving CESC, HNSC, ESCA, and BLCA cases), a melanoma subtype (k6, SKCM and UVM), a cholangiocarcinoma/pancreatic cancer subtype (k15), and a subtype (k21) of immune-related cancers (LAML, DLBC, THYM). With the k-means 25-subtype solution, BRCA cases were divided primarily among three clusters, one primarily of luminal A cases by Pam50 subtype (k1), one of mixed luminal A and luminal B cases (k11), and one primarily of basal-like Pam50 subtype (k23). See also Supplementary Figures 3 and 4 and Supplementary Data 1.

Page 5: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 6: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 3, related to Supplementary Figure 2. DNA copy alteration patterns associated with the multiplatform-based molecular subtypes of Supplementary Figure 1. The ordering of cancer cases (n=10662 TCGA cancer cases with available data for at least three platforms) is the same as that used in Supplementary Figure 2. The first heat map displays the subtypes defined independently by chromosomal copy alteration (red). TP53 mutation status is shown (green). The red/blue heat map denotes areas of DNA copy gain (red) or loss (blue) found within each chromosome. Corresponding cancer types (denoted by TCGA project name) are indicated along the bottom.

Page 7: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 8: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 4, related to Supplementary Figure 2. Multiplatform-based molecular classification of cancers using k-means clustering method. (A) Delta area plot graphic, generated using ConsensusClusteringPlus (Wilkerson and Hayes, 2010) of the platform-level subtype assignment matrix (Supplementary Figure 2A) of 3000 randomly selected TCGA cancer cases. The plot shows the relative change in area under the CDF curve comparing k and k − 1. For k = 2, there is no k -1, so the total area under the curve rather than the relative increase is plotted. This graphic allows one to determine the relative increase in consensus and determine k at which there is no appreciable increase. (B) K-means clustering was applied to the platform-level subtype assignment matrix from part (A) to define 15, 20, 25, and 30 subtypes. For each solution, the overlap between each multiplatform-based molecular subtype and each of the 32 cancer types (as designated by TCGA project name) are represented using a colorgram.

Supplementary Data 1, related to Supplementary Figure 2. Summary of cancer cases examined in this study, according to data platform. Provided as an Excel file.

Page 9: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 10: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 5, related to Figure 1. Additional molecular features characterizing the pan-cancer molecular subclasses. (A) Delta area plot graphic, generated using ConsensusClusteringPlus (Wilkerson and Hayes, 2010) of differential mRNA expression patterns (values normalized within each main cancer type) of 3000 randomly selected TCGA cancer cases, for a set of 2000 top variable mRNAs (see Methods). The plot shows the relative change in area under the CDF curve comparing k and k − 1. For k = 2, there is no k -1, so the total area under the curve rather than the relative increase is plotted. This graphic allows one to determine the relative increase in consensus and determine k at which there is no appreciable increase. (B) Top heat map shows differential mRNA expression patterns (values normalized within each main cancer type) for the set of 2000 top variable mRNAs, with features ordered by best correlation with a given pan-cancer class. The second heat map shows differential DNA methylation patterns (values centered within each main cancer type) for a set of 2000 top variable DNA methylation features (see Methods), with features ordered by best correlation with a given pan-cancer class. The third heat map denotes areas of DNA copy gain (red) or loss (blue) found within each chromosome. Additional sample-level data tracks denote levels of genome-wide copy number alteration (“copy alt. index” represents standard deviation of copy alteration logged ratios across all cytobands), mutations per megabase (Mb), cancer type (according to TCGA project), and Pam50 subtype (BRCA cases only). (C) Patient age at initial diagnosis by pan-cancer class. Box plots represent 5%, 25%, 50%, 75%, and 95%. Points in box plots are colored according to cancer type as defined by TCGA project, with color scheme as indicated in main Figure 1C. (D) Across each of the pan-cancer classes, proportion of male patients (excluding BRCA, CESC, OV, PRAD, UCEC, UCS cases). The chi-square test for differences across classes yields p=0.03 (not correcting for cancer type). (E) Top heat map shows differential expression patterns (values normalized within each cancer type), representing a top set of 50 miRNA features that distinguish between the ten molecular classes from part A. The second heat map shows differential protein expression patterns (by RPPA platform, values normalized within each cancer type), representing a top set of 25 features that distinguish between the ten subtypes. Features (also represented in Figure 1B) are labeled individually here. (F) For differential DNA methylation patterns (values centered within each cancer type) for a top set of features that distinguish a class associated with basal-like breast cancer (from Figure 1B), the subset of features significantly anti-correlated (p<0.01, Pearson’s using logit-transformed DNA methylation values and log-transformed expression values) between mRNA and DNA methylation (where the corresponding mRNA was also associated with c5 group but in the opposite direction) are represented here. Additional features showing inverse correlation between DNA methylation and mRNA are available in Supplementary Data 2.

Supplementary Data 2, related to Figure 1. Patient-level platform availability and subtyping features, along with top features by each platform used in the molecular classification analyses. Provided as an Excel file.

Page 11: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 12: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 6, related to Figure 1. Associations of pan-cancer molecular classes with TCGA sample batches. Batch information was obtained for 9346 cases from [http://bioinformatics.mdanderson.org/main/TCGABatchEffects:Overview]. The enrichments between our pan-cancer classes (c1 through c10) and TCGA batches—as defined by batch ID, Tissue Source Site (TSS), or sample shipping date—were calculated. For each pan-cancer class and batch variable, matrices represent the significance of enrichment (one-sided Fisher’s exact test) of batch membership within the given molecular class versus the rest of the tumors. Only p-values with FDR<0.1 (Storey and Tibshirani, 2003) are represented. We do not observe widespread associations between TCGA batch and pan-cancer class. While a number of statistically significant associations are observable, most if not all of these would be related to the cancer types involved in the corresponding batches; for example, c5 is all basal-like breast cancer, so batches with all breast cancer samples could be strongly associated here with c5. Where there is a statistically significant overlap, the number of cases involved in the overlap would represent a small fraction of the total cases that make up the given pan-cancer class.

Page 13: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 14: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 7, related to Figure 1. Enriched gene categories associated with pan-cancer molecular classes. For the top differentially expressed genes associated with each subtype (the genes represented in main Figure 1A), represented categories by Gene Ontology (GO) were assessed, with “high” genes evaluated separately from “low” genes. P-values by one-sided Fisher’s exact test. GO terms with significance level of p<0.0005 (involving at least three genes) for any one gene set are represented here. Left panel shows –log10(p-value) by GO term and subtype (red, associated with the subtype-specific “high” genes; blue, associated with the subtype-specific “low” genes). Right panel shows the average relative expression within a given subtype, for all genes falling under a given GO term category.

Page 15: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 8. Somatic mutations and associated pathways across pan-cancer molecular classes. (A) Pathway-centric view of nonsilent gene mutations and copy alterations in TCGA cohort (n=10224 cancer cases with available exome sequencing data). “High-level” deletion and “high-level” amplification respectively approximate total copy loss and copy levels more than 2X greater than that of wild-type (based on GISTIC thresholded values). See

Page 16: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Methods, Supplementary Figure 9, or Supplementary Data 3 for the genes associated with each pathway. See part B for cancer type color legend. (B) By cancer type (left) and by pan-cancer molecular class (right), significance of enrichment (one-sided Fisher’s exact test) of gene alteration events for each pathway within any particular cancer type/molecular class versus the rest of the cases. FDR, false discovery rate. Only associations with FDR<0.1(Storey and Tibshirani, 2003) are represented. (C) Pathway-associated mRNA and protein signatures were applied to TCGA expression profiles (using values normalized across all cancer cases). Box plots compare pathway-altered versus -unaltered cases for relative levels of the corresponding signature. P-values by Mann-Whitney U-test. Box plots represent 5%, 25%, 50%, 75%, and 95%. Points in box plots are colored according to cancer type as defined by TCGA project as indicated in part B. See also Supplementary Figures 9 and 10 and 11 and Supplementary Data 3.

Page 17: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 18: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 9, related to Supplementary Figure 8. Somatic alteration patterns across pan-cancer molecular classes and cancer types. (A) For the pathways represented in Supplementary Figure 8A, mutation and copy alteration events involving each gene included in the indicated pathway. “High-level” deletion and “high-level” amplification respectively approximate total copy loss and copy levels more than 2X greater than that of wild-type (based on GISTIC thresholded values). (B) For each TCGA project and pan-cancer class category, significance of enrichment (one-sided Fisher’s exact test) of mutation events for each gene within the given molecular class/cancer type versus the rest of the tumors. Only p-values with FDR<0.1 (Storey and Tibshirani, 2003) are represented.

Supplementary Data 3, related to Supplementary Figure 8. By patient, data for key somatic mutation, copy alteration, and DNA methylation features examined in this study. Provided as an Excel file.

Page 19: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 20: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 10, related to Supplementary Figure 8. Somatic alteration patterns across pan-cancer molecular classes and cancer types, with considerations for cancer type and mutation rate. (A) By pan-cancer molecular class, significance of enrichment of gene alteration events for each pathway within any particular molecular class versus the rest of the cases. Left panel shows results for one-sided Fisher’s exact test (no correction for cancer type), and right panel shows results for Cochran-Mantel-Haenszel test (one-sided) incorporating cancer type as a covariate. (B) By cancer type, significance of enrichment of gene alteration events for each pathway within any particular cancer type versus the rest of the cases. Top panel shows results for one-sided Fisher’s exact test (no correction for mutation rate), and bottom panel shows results for Cochran-Mantel-Haenszel test (one-sided) incorporating mutation rate as a covariate. For the Cochran-Mantel-Haenszel test, mutation rate of the samples was binned by ten percentile increments, from bottom 10% to top 90%. (C) For each TCGA project and pan-cancer class category, significance of enrichment of mutation events for each gene within the given molecular class/cancer type versus the rest of the tumors, with considerations for patterns associated with cancer type and mutation rate (in contrast to Supplementary Figure 9B). The top panel (by cancer type) shows results using Cochran-Mantel-Haenszel test incorporating mutation rate as a covariate. The bottom panel (by pan-cancer class) shows results using Cochran-Mantel-Haenszel test incorporating cancer type as a covariate. P-values are one-sided.

Page 21: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 11, related to Supplementary Figure 8. Somatic alteration patterns across pan-cancer molecular classes and cancer types, with considerations for tumor purity and sequencing coverage. (A) By pan-cancer molecular class, significance of

Page 22: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

enrichment of gene alteration events for each pathway within any particular molecular class versus the rest of the cases. Top left panel shows results for one-sided Fisher’s exact test (no correction for purity), and top right panel shows results for Cochran-Mantel-Haenszel test (one-sided) incorporating tumor purity as a covariate (purity of the samples was binned by ten percentile increments, from bottom 10% to top 90%). Bottom left panel shows results for one-sided Fisher’s exact test (no correction for sequencing coverage), and bottom right panel shows results for Cochran-Mantel-Haenszel test (one-sided) incorporating sequencing coverage information as a covariate (coverage of the samples was binned by ten percentile increments, from bottom 10% to top 90%). Coverage information was provided as “number of Basepairs covered” per exome, according to the study by Kandoth et al. (Kandoth et al., 2013). (B) By cancer type, significance of enrichment of gene alteration events for each pathway within any particular cancer type versus the rest of the cases. Top panel shows results for one-sided Fisher’s exact test (no stratification for other variables). Other panels show results for Cochran-Mantel-Haenszel test (one-sided) as indicated, incorporating tumor purity or sequencing coverage information as covariates. (C) For each TCGA project and pan-cancer class category, significance of enrichment of mutation events for each gene within the given molecular class/cancer type versus the rest of the tumors, with considerations for patterns associated with tumor purity or sequencing coverage, as indicated. P-values are one-sided.

Page 23: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 12, related to Figure 2. Other associations relevant to pan-cancer genomic classes associated with mesenchymal features. (A) Scatter plot of differential methylation vs differential expression, for cg15822328 versus miR-200a/200b/429. Methylation and expression values using values normalized or centered within each cancer type, as indicated. Normalized values for miR-200a, miR-200b, and miR-429 are averaged together. Numbers of cases denote representation on all three data platforms for mRNA-seq, miRNA-seq, and 450K DNA methylation. (B) By pan-cancer genomic class, significance of enrichment (one-sided Fisher’s exact test) of previously identified cancer subsets within any particular genomic class versus the rest of the cases. BRCA, TCGA breast cancer project; ILC, Invasive Lodular Carcinoma; IDC, Invasive Ductal Carcinoma; RCC, renal cell carcinoma (TCGA KIRC, KIRP, and KICH projects); CC-e.3, EMT-associated clear cell renal cell carcinoma genomic subtype from (Chen et al., 2016b); NSCLC, non-small cell lung cancer; SQ.1, EMT-associated squamous cell carcinoma genomic subtype from (Chen et al., 2016a); LUAD, TCGA lung adenocarcinoma project; LCNEC, large cell neuroendocrine carcinoma; AD.1, LCNEC-associated adenocarcinoma genomic subtype from (Chen et al., 2016a); OV, TCGA ovarian cancer project; Immunoreactive, Mesenchymal, previously identified (Cancer_Genome_Atlas_Research_Network, 2011); estimated purity, from Aran et al. (Aran et al., 2015). (C) Results of linear regression models predicting EMT signature scoring levels on the basis of both estimated tumor sample purity and pan-cancer class membership, considering c3 versus c6 cases (left), c7 versus c6 cases (middle), and c8 versus c6 cases (right).

Page 24: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Data 4, related to Figure 2. By patient, pathway-associated and immune-associated gene signature scorings. Also included are pathway signature gene sets. Provided as an Excel file.

Page 25: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 13, related to Figure 3. Additional normal tissue and cell type associations with the pan-cancer genomic classes. (A) Inter-profile correlations were computed between TCGA expression profiles (with values normalized within each cancer type) and profiles from the Fantom consortium expression dataset of various cell types or tissues from hmouse specimens (n=389 profiles)(FANTOM_Consortium_and_the_RIKEN_PMI_and_CLST_(DGT) et al., 2014). Membership of the Fantom profiles in general categories of “immune” (immune cell types or blood or related tissues), “CNS” (related to central nervous system including brain), “squamous” (including

Page 26: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

bronchial, trachea, oral regions, throat and esophagus regions, nasal regions, urothelial, cervix, sebocyte, keratin/skin/epidermis), or “embryo” is indicated. Numbers of TCGA cases (n=10224) denote representation on RNA-seq data platform. (B) Heat map of gene expression-based signature scoring (Bindea et al., 2013) of immune cell infiltrates, across TCGA pan-cancer classes (expression values normalized within each cancer type). Numbers of cases (n=10224) denote representation on RNA-seq data platform. TREG cells, regulatory T cells; TGD cells, T gamma delta cells; Tcm cells, T central memory cells; Tem cells, T effector memory cells; Tfh cells, T follicular helper cells; NK cells, natural killer cells; DC, dendritic cells; iDC, immature DCs; aDC, activated DCs; P-DC, plasmacytoid DCs; APM1/APM2, antigen presentation on MHC class I/class II, respectively.

Supplementary Data 5, related to Figure 3. Inter-profile correlations computed between TCGA expression profiles (with values normalized within each cancer type) and profiles from the Fantom consortium expression datasets (both mouse and human), with the correlations averaged across TCGA samples within each pan-cancer class. Provided as an Excel file.

Page 27: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 14, related to Figure 5. Assessment of the overall strength of the correlations behind the pan-cancer class assignments made to the expO dataset expression profiles. (A) Histogram of results from 1000 random permutations of the expO dataset (GSE2109) assignments; in each permutation test, the gene ordering of the expO dataset was made random relative to TCGA dataset, and expO assignments were made using the “best fit” class with the highest correlation. The distribution of the best fit class correlations from all of the permuted datasets are shown (representing 1000X2041 best class similarity scores), along with the average best fit correlations for each of the ten pan-cancer classes in the actual, non-permuted datasets. (B) For the 854 class-specific genes from TCGA dataset (main Figure 1A), the corresponding patterns in the expO dataset are shown (profiles being normalized within their respective cancer type), along with the correlation scores (or “mRNA profile similarity scores”) for each class and for each profile (purple-cyan heat map). For each expO profile, the class that had the highest score out of the ten was assigned to that profile. A plot of the best fit similarity scores (used to assign the class) is provided, as well as a plot of the best fit score minus the “next best” score (i.e. the next highest score after the score used to assign the class).

Page 28: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 15, related to Figure 5. Comparisons between TCGA and expO datasets for pan-cancer molecular class associations of interest. For the molecular features represented in main Figure 5, the purple-cyan heat maps denote t-statistics for comparing the given class versus the rest of the tumors. Left, TCGA dataset; right, expO (GSE2109) dataset. Dark purple or cyan corresponds approximately to p<0.01.

Page 29: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary
Page 30: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

Supplementary Figure 16, related to Figure 5. Observation of patterns associated with TCGA pan-cancer genomic classes in an external multi-cancer expression profiling dataset of cell lines. (A) Gene expression profiles of 1034 cancer cell lines originally derived from various pathologically defined cancer types, represented in the Cancer Cell Line Encyclopedia (CCLE) dataset (Barretina et al., 2012) (profiles being normalized within their respective cancer type), were classified according to TCGA pan-cancer genomic class. Expression patterns for the top set of 854 mRNAs distinguishing between the ten TCGA genomic classes (from main Figure 1A) are shown for both TCGA and CCLE datasets. Genes in the CCLE sample profiles sharing some level of similar patterns with those of TCGA class-specific signature patterns are highlighted. Lung cancer cell lines with small cell lung cancer histology (“lung SCLC”) are indicated, which cancer type is known to express neuroendocrine markers. (B) In the same manner as carried out for TCGA datasets, CCLE expression profiles were scored for pathway-associated gene signatures (from main Figure 2A), surveyed for immune checkpoint markers and for CT antigen genes (from main Figure 3B, using the same gene ordering), and scored for similarity to normal cell type categories represented in the fantom dataset (from main Figure 3A). Pan-cancer class associations of particular interest as highlighted in main Figure 5B are highlighted here (whether or not the patterns observed in CCLE dataset would tend to follow the patterns first observed in TCGA cohort). Parts (A) and (B) have the same ordering of CCLE expression profiles.

Supplementary Data 6, related to Figure 5. Pan-cancer class assignments and pathway-associated gene signature scorings, for each of the expression profiles in the expO (GSE2109) dataset. Provided as an Excel file.

Page 31: clincancerres.aacrjournals.org · Web viewPan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases Supplementary

REFERENCES

Aran, D., Sirota, M., and Butte, A. (2015). Systematic pan-cancer analysis of tumour purity. Nat Commun 6, 8971.Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A., Kim, S., Wilson, C., Lehár, J., Kryukov, G., Sonkin, D., et al. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603-607.Bindea, G., Mlecnik, B., Tosolini, M., Kirilovsky, A., Waldner, M., Obenauf, A., Angell, H., Fredriksen, T., Lafontaine, L., Berger, A., et al. (2013). Spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer. Immunity 39, 782-795.Cancer_Genome_Atlas_Research_Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-615.Chen, F., Zhang, Y., Parra, E., Rodriguez, J., Behrens, C., Akbani, R., Lu, Y., Kurie, J., Gibbons, D., Mills, G. , et al. (2016a). Multiplatform-based Molecular Subtypes of Non-Small Cell Lung Cancer. Oncogene E-pub Oct 24.Chen, F., Zhang, Y., Şenbabaoğlu, Y., Ciriello, G., Yang, L., Reznik, E., Shuch, B., Micevic, G., De Velasco, G., Shinbrot, E., et al. (2016b). Multilevel Genomics-Based Taxonomy of Renal Cell Carcinoma. Cell Rep 14, 2476-2489.FANTOM_Consortium_and_the_RIKEN_PMI_and_CLST_(DGT), Forrest, A., Kawaji, H., Rehli, M., Baillie, J., de Hoon, M., Lassmann, T., Itoh, M., Summers, K., Suzuki, H., et al. (2014). A promoter-level mammalian expression atlas. Nature 507, 462-470.Kandoth, C., McLellan, M., Vandin, F., Ye, K., Niu, B., Lu, C., Xie, M., Zhang, Q., McMichael, J., Wyczalkowski, M., et al. (2013). Mutational landscape and significance across 12 major cancer types. Nature 502, 333-339.Storey, J. D., and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100, 9440-9445.Wilkerson, M., and Hayes, D. (2010). ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572-1573.