1 proteogenomic novelty in 105 tcga breast tumors karl clauser cptac breast cancer analysis group...
TRANSCRIPT
1
Proteogenomic Noveltyin 105 TCGA Breast Tumors
Karl Clauser
CPTAC Breast Cancer Analysis Group
Broad Institute of MIT and Harvard
Fred Hutchinson Cancer Research Center
Washington University
New York University
CPTAC Data Jamboree
November 12, 2013
National Institutes of Health
Bethesda, Maryland
Tumor-specific protein databases forMS/MS-spectra searches
Kelly Ruggles, David Fenyo, NYU
Type Genome Proteome PhosphoProteomeSingle AA Variants ∑ 119,977 ∑ 3,028 Germline 91,944 1,903 Somatic 9,607 85 Germline & Somatic 18,426 1,013
Alternative splices (junction-spanning) 36,196 279Fusion genes (junction-spanning)Novel exonNovel geneFrame shiftNovel splicing (junction-spanning)
Proteogenomic mapping: Genetic alterations can be observed on protein level (81 tumors)
Preliminary novel
findings
|||
work inprogress
|||
|||||||
work inprogress
||||||||
• Low confidence thresholds applied to Genome calls• High confidence thresholds applied to Proteome calls (<1% FDR)
• 0.7-2.5% of alternative splice junctions and single AA variants observable by proteomics• mRNA may not be translated or at low abundance• Proteome coverage is incomplete
1 mg total protein per tumorInternal reference: equal representation of basal, Her2 and Luminal A/B subtypes
Global proteome and phosphoproteomediscovery workflow for TCGA breast tumors
Serial Search Strategy with Personalized Databases
>Refseq ProteinSIGNALINGPATHWAYREGULATOR
19,673,636 Spectra(81 patients)(27 iTRAQ experiments)(25 LC-MS/MS runs / experiment)
RefSeq-Human-37: 32,800
8,037,319 Spectra Matched(41% of total)(1% FDR)
• Can combined FDR be calculated?• Can search engine retain speed
by skipping unchanged peptides?
3028 Variants Matched(N Spectra)(2294 proteins)
279 Splice Junctions Matched(y Spectra)
11,636,317leftoverspectra
• Concatenated FASTA files, 102 patients
• Altered proteins only• Removed redundant entries
> Refseq Protein – Variant Patient 1 SIGNALINGPATHWAHREGULATOR>Canonical Protein – Variant Patient 2 SIKNALINGPATHWAYREGULATOR
Variants: 132,181
> Refseq Protein – Alternate splice Patient 1 SIGNALINGREGULATOR>Canonical Protein – Alternate splice Patient 2 SIGNALINGPATHREGULATOR
Alternate Spliceforms: 67,035
Low confidence thresholds applied to Genome calls• Variants: >2 QUAL score (phred-scaled) • Alternative splices: >1 read
Single AA Variants may be Somatic in Some Patients, Germline in Others
Genomic
Proteomic
• Highly Interesting, should correlate with prognosis and/or subtype.
• May correlate with prognosis?• Might as well be canonical isoforms?
• Detectable, but too rare to indicate biology.
Variant Type Gen Prot P/GGermline Only >1 patient 34,022 1,226 3.6%Germline Only 1 patient 57,922 704 1.2%G&S mix 18,426 1,013 5.5%Somatic Only >1 patient 270 3 1.1%Somatic Only 1 patient 9,337 82 0.9%
119,977 3,028 2.5%
• G&S mix genomic variants have the highest observation rate by Proteomics.
• Genomic variants present in only a single patient are observable by Proteomics
Not all Germline &Somatic mix Single AA Variants are “Essentially” Germline
• Is G&S mix status primarily an artifact of variant calling accuracy/sensitivity?• Is there some cancer biology involved for high S/G ratio variants?
•Are patients with germline form more cancer prone?•Does somatic form correlate with prognosis, development of drug-resistance?
Genomic Proteomic
155/279 Alternative Splice Junctions were observed in >1 Proteomics Experiment
1 4 7 10 13 16 19 22 251
10
100
1000
124
35
141710
69
6
34
5
1
4 43
4 4
2
4
2
1
54
8
# Proteomics Experiments with Splice Junc-tion Peptide
# A
lter
nativ
e Sp
lice
Junc
tion
Pepti
des
0 3 6 9 12 15 18 21 24 270
5
10
15
20
25
30
35
40
45
# of Proteomics Experiments with Splice Junction Peptide
# Pa
tient
s in
Com
mon
Con
trol
wit
h A
S Tr
ansc
ript
279 Alternative Splice Junctions observed in 27 proteomics experiments (iTRAQ 4-plex)1 experiment: 3 individual patients + 1 Common control (40 patients)
Wide Range of Somatic Single AA Variants/Patient
D8-A13Y A7-A0CJ A2-A0YM E2-A10A AR-A0TV AN-A0AL BH-A0BZ A8-A09I BH-A18N BH-A0C0 AO-A12B A8-A08G AR-A0U4 AR-A1AW BH-A0DG A8-A08Z A2-A0EX A2-A0T210
100
1,000
10,000
100,000Germline VariantsSomatic VariantsAlternative Splices
Low confidence thresholds applied to calls• Variants: >2 QUAL score (phred-scaled) • Alternative splices: >1 read
0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 1021
10
100
1,000
10,000
100,000Germline VariantsSomatic VariantsAlternative Splices
# Patients with Feature
# Fe
atur
esFrequency of Single AA Variants and Alternative Splices Across Patients
verycommon
• Somatic variants are less frequent than germline variants• Some germline variants are very common
• Rare germline variants present in the reference sequence (RefSeq)• Some alternative splice forms are very common
• Should be in RefSeq
Next steps:
• Analyze data from all tumors (81/105 so far)• Examine “other” category
– Fusion genes (junction-spanning)– Novel exon– Novel gene– Frame shift– Novel splicing (junction-spanning)
• Analyze phosphoproteomics data• Use updated output of Genomic analysis pipeline• Employ more thorough FDR calculation for PSM’s
– Single-pass search of all spectra against concatenated database• Reference proteome, Variants, Alternate splice forms, “Other”
Acknowledgments
Washington U./MD Anderson/NYU- Sherri Davies- Matthew Ellis- David Fenyo- Kelly Ruggles- Reid Townsend- Li Ding
Broad Institute/FHCRC- Steve Carr- Karl Clauser- Michael Gillette- Jana Qiao- Philipp Mertins- DR Mani- Eric Kuhn- Sue Abbatiello- Amanda Paulovich- Pei Wang- Sean Wang- Ping Yan
NCI Staff- Emily Boja- Mehdi Mesri- Rob Rivers- Chris Kinsinger- Henry RodriguezFunding
- National Cancer Institute