1 proteogenomic novelty in 105 tcga breast tumors karl clauser cptac breast cancer analysis group...

12
1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer Research Center Washington University New York University CPTAC Data Jamboree November 12, 2013 National Institutes of Health Bethesda, Maryland

Upload: tyler-baker

Post on 18-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

1

Proteogenomic Noveltyin 105 TCGA Breast Tumors

Karl Clauser

CPTAC Breast Cancer Analysis Group

Broad Institute of MIT and Harvard

Fred Hutchinson Cancer Research Center

Washington University

New York University

CPTAC Data Jamboree

November 12, 2013

National Institutes of Health

Bethesda, Maryland

Page 2: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Tumor-specific protein databases forMS/MS-spectra searches

Kelly Ruggles, David Fenyo, NYU

Page 3: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Type Genome Proteome PhosphoProteomeSingle AA Variants ∑ 119,977 ∑ 3,028 Germline 91,944 1,903 Somatic 9,607 85 Germline & Somatic 18,426 1,013

Alternative splices (junction-spanning) 36,196 279Fusion genes (junction-spanning)Novel exonNovel geneFrame shiftNovel splicing (junction-spanning)

Proteogenomic mapping: Genetic alterations can be observed on protein level (81 tumors)

Preliminary novel

findings

|||

work inprogress

|||

|||||||

work inprogress

||||||||

• Low confidence thresholds applied to Genome calls• High confidence thresholds applied to Proteome calls (<1% FDR)

• 0.7-2.5% of alternative splice junctions and single AA variants observable by proteomics• mRNA may not be translated or at low abundance• Proteome coverage is incomplete

Page 4: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

1 mg total protein per tumorInternal reference: equal representation of basal, Her2 and Luminal A/B subtypes

Global proteome and phosphoproteomediscovery workflow for TCGA breast tumors

Page 5: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Serial Search Strategy with Personalized Databases

>Refseq ProteinSIGNALINGPATHWAYREGULATOR

19,673,636 Spectra(81 patients)(27 iTRAQ experiments)(25 LC-MS/MS runs / experiment)

RefSeq-Human-37: 32,800

8,037,319 Spectra Matched(41% of total)(1% FDR)

• Can combined FDR be calculated?• Can search engine retain speed

by skipping unchanged peptides?

3028 Variants Matched(N Spectra)(2294 proteins)

279 Splice Junctions Matched(y Spectra)

11,636,317leftoverspectra

• Concatenated FASTA files, 102 patients

• Altered proteins only• Removed redundant entries

> Refseq Protein – Variant Patient 1 SIGNALINGPATHWAHREGULATOR>Canonical Protein – Variant Patient 2 SIKNALINGPATHWAYREGULATOR

Variants: 132,181

> Refseq Protein – Alternate splice Patient 1 SIGNALINGREGULATOR>Canonical Protein – Alternate splice Patient 2 SIGNALINGPATHREGULATOR

Alternate Spliceforms: 67,035

Low confidence thresholds applied to Genome calls• Variants: >2 QUAL score (phred-scaled) • Alternative splices: >1 read

Page 6: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Single AA Variants may be Somatic in Some Patients, Germline in Others

Genomic

Proteomic

• Highly Interesting, should correlate with prognosis and/or subtype.

• May correlate with prognosis?• Might as well be canonical isoforms?

• Detectable, but too rare to indicate biology.

Variant Type Gen Prot P/GGermline Only >1 patient 34,022 1,226 3.6%Germline Only 1 patient 57,922 704 1.2%G&S mix 18,426 1,013 5.5%Somatic Only >1 patient 270 3 1.1%Somatic Only 1 patient 9,337 82 0.9%

119,977 3,028 2.5%

• G&S mix genomic variants have the highest observation rate by Proteomics.

• Genomic variants present in only a single patient are observable by Proteomics

Page 7: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Not all Germline &Somatic mix Single AA Variants are “Essentially” Germline

• Is G&S mix status primarily an artifact of variant calling accuracy/sensitivity?• Is there some cancer biology involved for high S/G ratio variants?

•Are patients with germline form more cancer prone?•Does somatic form correlate with prognosis, development of drug-resistance?

Genomic Proteomic

Page 8: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

155/279 Alternative Splice Junctions were observed in >1 Proteomics Experiment

1 4 7 10 13 16 19 22 251

10

100

1000

124

35

141710

69

6

34

5

1

4 43

4 4

2

4

2

1

54

8

# Proteomics Experiments with Splice Junc-tion Peptide

# A

lter

nativ

e Sp

lice

Junc

tion

Pepti

des

0 3 6 9 12 15 18 21 24 270

5

10

15

20

25

30

35

40

45

# of Proteomics Experiments with Splice Junction Peptide

# Pa

tient

s in

Com

mon

Con

trol

wit

h A

S Tr

ansc

ript

279 Alternative Splice Junctions observed in 27 proteomics experiments (iTRAQ 4-plex)1 experiment: 3 individual patients + 1 Common control (40 patients)

Page 9: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Wide Range of Somatic Single AA Variants/Patient

D8-A13Y A7-A0CJ A2-A0YM E2-A10A AR-A0TV AN-A0AL BH-A0BZ A8-A09I BH-A18N BH-A0C0 AO-A12B A8-A08G AR-A0U4 AR-A1AW BH-A0DG A8-A08Z A2-A0EX A2-A0T210

100

1,000

10,000

100,000Germline VariantsSomatic VariantsAlternative Splices

Low confidence thresholds applied to calls• Variants: >2 QUAL score (phred-scaled) • Alternative splices: >1 read

Page 10: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 1021

10

100

1,000

10,000

100,000Germline VariantsSomatic VariantsAlternative Splices

# Patients with Feature

# Fe

atur

esFrequency of Single AA Variants and Alternative Splices Across Patients

verycommon

• Somatic variants are less frequent than germline variants• Some germline variants are very common

• Rare germline variants present in the reference sequence (RefSeq)• Some alternative splice forms are very common

• Should be in RefSeq

Page 11: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Next steps:

• Analyze data from all tumors (81/105 so far)• Examine “other” category

– Fusion genes (junction-spanning)– Novel exon– Novel gene– Frame shift– Novel splicing (junction-spanning)

• Analyze phosphoproteomics data• Use updated output of Genomic analysis pipeline• Employ more thorough FDR calculation for PSM’s

– Single-pass search of all spectra against concatenated database• Reference proteome, Variants, Alternate splice forms, “Other”

Page 12: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Acknowledgments

Washington U./MD Anderson/NYU- Sherri Davies- Matthew Ellis- David Fenyo- Kelly Ruggles- Reid Townsend- Li Ding

Broad Institute/FHCRC- Steve Carr- Karl Clauser- Michael Gillette- Jana Qiao- Philipp Mertins- DR Mani- Eric Kuhn- Sue Abbatiello- Amanda Paulovich- Pei Wang- Sean Wang- Ping Yan

NCI Staff- Emily Boja- Mehdi Mesri- Rob Rivers- Chris Kinsinger- Henry RodriguezFunding

- National Cancer Institute