using ms/ms spectrum libraries for the detection of · –combine sequence search with spectrum...
TRANSCRIPT
© 2009 SIB
Using MS/MS Spectrum Libraries for the Detection of
PTM’s
Markus Müller
Swiss Institute of Bioinformatics
Geneva, Switzerland
© 2009 SIB
Outline
• MS/MS peptide identification – Spectrum library versus sequence search
• QuickMod MS/MS workflow
• QuickMod Open modification spectrum library search – Alignment scoring
– Statistical validation
– Positioning of modifications
2 QuickMod Tutorial 2011
© 2009 SIB
Spectrum Library Searches
3 QuickMod Tutorial 2011
© 2009 SIB
Spectrum Library Searching
QuickMod Tutorial 2011 4
© 2009 SIB QuickMod Tutorial 2011 5
Peptide-Spectrum Match (PSM)
p = LREQLGPVTQEFWDNLEK; z = 3
© 2009 SIB
Spectrum Library Search Scoring
• Log-transform intensities (variance stabilization, i.e. the variance of a peak becomes independent of its intensity).
• Bin peak (m/z-intensity) lists into bins of width =0.1-1.0 m/z units.
• Normalized dot-product score:
6
21
1
22
21
1
11
1
21
21
21
1
2
22
2
2
1
2
2
2222
11
2
1
1
1
1
1111
cos
log
,..,,1;log, Spectrum
,..,,1;log, Spectrum
minmin
N
i
ii
N
i
ii
N
i
ii
jmmjm
i
k
j
Nbinningiii
Nbinningiii
ssss
ss
SS
SSscore
Is
sssSniImPS
sssSniImPS
ki
QuickMod Tutorial 2011
© 2009 SIB
Spectral Library Search
7 QuickMod Tutorial 2011
Zhang et al., Proteomics 2011
© 2009 SIB
Spectral Library Search
8 QuickMod Tutorial 2011
Zhang et al.
© 2009 SIB
Spectral Library Search
9 QuickMod Tutorial 2011
Zhang et al.
© 2009 SIB
Spectral Library Search
10 QuickMod Tutorial 2011
Zhang et al., Proteomics 2011
© 2009 SIB
Spectrum Library Searches
• Spectrum library searches are more accurate than sequence searches.
• Scoring is less critical and easier to implement.
• Spectrum library searches are very fast compared to sequence searches.
• Libraries must be complete. Low abundance proteins are rarely found in spectrum libraries.
• Different libraries for different instruments.
11 QuickMod Tutorial 2011
© 2009 SIB
Completeness of Libraries
12 QuickMod Tutorial 2011
Yeast data and one of the completest yeast libraries: 20281 of 25348 non-phospho peptides found 14186 of 31120 phospho peptides found
© 2009 SIB
Completeness of Spectrum Libraries
• Only 2 TF in NIST spectrum libraries of human protein!
– For a given biological sample, measure the sample repetitively using inclusion/exclusion list to get maximum coverage of the peptides in the spectrum library (Schmidt A, et al.)
– Clone TF in bacteria, purify, digest and measure with LC-MS (Bart Deplancke Lab)
– Create synthetic peptides for all proteins of an organism and measure them with LC-MS (Aebersold lab)
– Combine sequence search with spectrum library search (Ahrne et al, 2009)
– Create realistic in silico spectra to complement real spectra (Cannon et al, JPR, 2011)
• Few modified peptides in libraries – Use and OMS spectrum library search tool, if the unmodified form of the
peptide is present (QuickMod, see below)
– Isolate modified peptides and create spectrum libraries for specific modifications (PhosphoPep, PHOSPHIDA,..)
13 QuickMod Tutorial 2011
© 2009 SIB
Prediction of MS/MS Spectra
14 QuickMod Tutorial 2011
Cannon et al, JPR, 2011 Zhang et al., Proteomics 2011
© 2009 SIB
Spectrum Library Searches
15
Ahrne et al., Proteomics 2009
QuickMod Tutorial 2011
© 2009 SIB
Spectrum Libraries
Spectra identified with SpectraST, but not with Phenyx Ahrné et al. Proteomics, 2009
16 QuickMod Tutorial 2011
© 2009 SIB
QuickMod Spectral Library Search Workflow
17
Ahrné et al, Proteomics, 2009
QuickMod Tutorial 2011
© 2009 SIB
Combining Search Tools (PepArML)
18 QuickMod Tutorial 2011
https://edwardslab.bmcb.georgetown.edu/pymsio/
© 2009 SIB
Random and True Matches
• When searching a large database, most of the candidate peptides are not present at a detectable level in a MS2 spectrum.
• For example, in silico tryptic digest of 10000 proteins may yield 100x 10000 = 1’000’000 peptides, but only 300 of these peptides may actually be detectable in MS2 spectra.
• The score distribution will (hopefully) be bimodal: many low scores for the random matches and higher scores for the true matches.
• The random and true score distributions will evidently overlap, if the database is large.
19 QuickMod Tutorial 2011
© 2009 SIB
Statistical Scores
False discovery rate : FDR = FPR = B/(A+B); P-value: pValue = B/(B+C) Posterior error probability: PEP = b/(a+b) (see TPP)
20 QuickMod Tutorial 2011
© 2009 SIB
Statistical Scores
• Statistical scores do not depend on the details of the scoring function.
• The underlying scoring function can even be multidimensional, i.e. include several scores of a search engine.
• Statistical scores have a unified probabilistic interpretation, i.e. they correspond to frequencies and counts.
• This allows comparing the statistical scores of different search engines with each other.
21 QuickMod Tutorial 2011
© 2009 SIB
False Discovery Rate (FDR) • Decoy search to control FDR on peptide and protein level
• Works for both single and combined runs if applied correctly
• Does not provide an answer about modification positioning.
• Does not provide an answer if there is more than one high scoring PSM.
• FDR is very sensitive to high scoring random matches.
• The number of peptides identified at a given FDR is dependent on the way the decoy database is created and the way FDR is calculated.
• Statistically the FDR is an expectation value, i.e. the mean of many different decoy searches:
• Each estimate with a single decoy db is only accurate within its standard error (Granholm & Käll, Proteomics 2011):
0/ TPFPTPFPFPEFDR
0025.05.0,01.0,2400
/1
FDRTPFP
TPFPFDR
22 QuickMod Tutorial 2011
© 2009 SIB
Robustness of FDR
23 QuickMod Tutorial 2011
© 2009 SIB
Creation of Decoy Spectrum Libraries
QuickMod Tutorial 2011
24
1. Shuffle sequence
2. Move annotated b,y,c,z-ions in accordance with shuffled sequence (e.g. y8+ -> y8+)
3. Sample non-annotated m/z if they do not belong to a conserved pattern (intensity is left intact)
Ahrne et al, Preoteomics, 2011
© 2009 SIB
Fragment Peak Distribution
25 QuickMod Tutorial 2011
ETD
IT
© 2009 SIB
Controlling FDR
26
DeLiberator Ahrné et al, Proteomics, 2011
QuickMod Tutorial 2011
© 2009 SIB
MS/MS Spectra of Modified Peptides
• Modifications of mass of a amino acid in a peptide induce several important changes in the MS/MS spectrum:
– Precursor m/z is shifted by /z
– All the m/z values of the fragment ions, which contain the modified amino acid are shifted by /z
– All the m/z values of the fragment ions, which do not contain the modified amino acid remain the same. However, their intensities my change significantly.
– Multiple modifications induce more complicated changes.
27 QuickMod Tutorial 2011
© 2009 SIB
Similarity Between Modified and Unmodified Spectra
28
Oxidation of GQGTLSVVTM{16}YHK/2
Phosphorylation of TY{80}FPHFDLSHGSAQVK/2
QuickMod Tutorial 2011
© 2009 SIB
QuickMod Open modification search: Spectral alignment and scoring Controlling FDR Modification
positioning
29
Ahrné et al. Recomb2011/JPR, submitted
QuickMod Tutorial 2011
© 2009 SIB
OMS: Spectrum Libraries Versus Theoretical Spectra
30 QuickMod Tutorial 2011
© 2009 SIB
QuickMod Scores
31
QuickMod score = Linear SVM combination of 3 best scores
Z=2
Z=3
QuickMod Tutorial 2011
© 2009 SIB
Benchmarking
32
Speed: InsPecT 30 min, PTMFinder 5 min; SpectraST 55 min; QuickMod 5 min
QuickMod Tutorial 2011
© 2009 SIB
Modification Positioning
33
C I S K
b1,b2,b3 b2,b3,y3 b3,y2,y3 y3,y2,y3
- 1 - 1 - 1 -1 - 1 + 1 -1 + 1 + 1 +1 + 1 + 1
-3 -1 +1 +3
QuickMod Tutorial 2011
© 2009 SIB
Modification Positioning
34 QuickMod Tutorial 2011
© 2009 SIB
Multiple Modifications
• QuickMod is primarily designed for single modifications
• Double modifications can also be detected as long as the 2 modified residues are close together
• Positioning yields a region between the two modified amino acids
35 QuickMod Tutorial 2011
© 2009 SIB
Modification Positioning
36 QuickMod Tutorial 2011
© 2009 SIB
Modification Positioning
1) QuickMod Workflow 2) Directed MS (Inclusion list)
3) Complimentary Fragmentation CID/HCD or MS3
HCD/CID
CID
B2,Y2
IK,IF,IH Y3 Y4
Y5
Y7 Y8
37 QuickMod Tutorial 2011
© 2009 SIB
QuickMod Tools
QuickMod Tutorial 2011
38
© 2009 SIB
Java Proteomics Library (JPL) http://javaprotlib.sourceforge.net/
39 QuickMod Tutorial 2011
© 2009 SIB
Future Work
• Extend alignment to multiple modifications
• Develop modification specific scores and positioning algorithms (phosphorylation)
• Work on combined sequence search and spectrum library search
• Apply QM to large datasets for phosphorylation and other modifications.
• Use it for verification of MS/MS assignments.
• …
40 QuickMod Tutorial 2011
© 2009 SIB
Many Thanks to
Proteome Informatics Group Swiss Institute of Bioinformatics Swetha Ramagoni Luc Mottin Leelapavan Tadoori Nottania Campbell Erik Ahrné Yuki Ohta Frederic Nikitin Rostyk Kuzyakiv Dominique Kadio Koua Patricia Palagi Markus Müller Frederique Lisacek
BPRG Alex Scherl Maria Ramirez-Boo Xavier Robin Alex Hainard Natacha Turck Jean-Charles Sanchez
SCAHT Laurent Geiser Florent Glück Paola Antinori Denis Hochstrasser
41 QuickMod Tutorial 2011
SIP-CUI Fokko Beekhof Oleksiy Koval Slava Voloshynovskiy