how to identify peptides
DESCRIPTION
How to identify peptides. Gustavo de Souza IMM, OUS. October 2013. Peptide or Proteins?. Bottom-up Proteomics. 2DE-based approach. Peptide Mass Fingerprinting. MALDI (Matrix Assisted Laser Desorption Ionization). Peptide Mass Fingerprinting. Intensity. m/z. MS/MS. Ion Source. Mass - PowerPoint PPT PresentationTRANSCRIPT
How to identify peptides
October 2013
Gustavo de SouzaIMM, OUS
Peptide or Proteins?
Bottom-up Proteomics
2DE-based approach
Peptide Mass Fingerprinting
MALDI (Matrix Assisted Laser Desorption Ionization)
Peptide Mass Fingerprinting
m/z
Inte
nsity
MS/MS
IonSource
MassAnalyzer
DetectorMass
AnalyzerMass
Analyzer
Collision cell
MS/MS
899.013
899.013
899.013
Fragmentation
Nomenclature for peptide sequence-ions:
Collision-Induced Dissociation (CID): MHn
n+* + N2 --> b + y
Electron Capture Dissociation (ECD): MHn
n++ e- --> MHn(n-1)+· --> c + z·
Fragmentation
H2NN H
H N
N H
H N
N H
R1
R2
R3
R4
R5
H N
R6
N H
R7
R8O
O
O
O
O
O
O
O
OH
y7
b1
y6
b2
y4
b4
y5
b3
y2
b6
y3
b5
y1
b7
Roepstorff-Fohlmann-Biemann-Nomenclature
Fragmentation
12 aa
… …
b ions y ions
MS/MS of a peptideLG_y2_13 #11793 RT: 84.81 AV: 1 NL: 3.57E5T: ITMS + c ESI d w Full ms2 [email protected] [ 190.00-1485.00]
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lativ
e A
bu
nd
an
ce
P y13
y12y11
y10
y9
y8
y7
y6
y5y4
y3
y2 b13b12b11
b10
b9
b8
b7
b6
b5
b4b3
P y++13VPTVDVSVVDLTVK
How to Identify MS/MS
Stenn and Mann, 2004.
Peptide Sequence Tags
Autocorrelation
Probability based match
Submitting to Search
How identification happen?
Your data Protein database (fasta)
Step 1: which theoretical peptides has the same mass of the observed ion?
Step 2: From those, which one have the most similar fragmentation pattern?
x x x
High mass accuracy – what is it good for?
All theoretical tryptic peptide masses from human IPI database
Example Tryptic HSP-70 peptide: ELEEIVQPIISK, mass 1396.7813 Da
1111
Ext.Ext.
2 ppm2 ppm
LTQ-FTLTQ-FT
9933335252344344# of tryptic # of tryptic peptides for peptides for m/z m/z 1396.78131396.7813
Ext-SIMExt-SIMInt.Int.Ext.Ext.Ext.Ext.CalibrationCalibration
1 ppm1 ppm10 ppm10 ppm20 ppm20 ppm500500Mass Mass AccuracyAccuracy
LTQ-FTLTQ-FTQSTARQSTARQSTARQSTARLTQLTQInstrumentInstrument
33
Int.Int.
0.5 ppm0.5 ppm
LTQ-FTLTQ-FT
Defining the “Search Space”
The “Search Space”
0 mcl
12 34 5
6
1/2
12 34 5
6
2/3
3/44/5
5/6
1 mcl
1/2
12 34 5
6
2/3
3/44/5
5/6
2 mcl
1/2/3
2/3/43/4/5
4/5/6
Importance of Search Space Size
Search tool does not identify a peptide. It only reports the statiscally most suitable theoretical sequence related with the experimental data.
If you increase the size of the database too much, or the size of the search space, false-positive rates also
increase.
Steen and Mann, 2004
Defining FDRs
Chance that two peptides with different sequences but approximate Mr and sharing MS/MS similarities.
More variables inserted during search Higher chance to get random events Higher MOWSE score threshold
Parameters that can modify the MOWSE calculation:
-Database size;
-MMD (measured mass deviation);
-Number of PTMs choosen;
-Data quality.
MOWSE
Mycoplasma sp. sample (Munich 2006):
-Database had ~ 700 entries;
-Data accuracy had 0.7ppm average;
-MMD used during search: 3 ppm.
Probability Based Mowse ScoreIons score is -10*Log(P), where P is the probability that the observed match is a random event.Individual ions scores > 7 indicate identity or extensive homology (p<0.05).Protein scores are derived from ions scores as a non-probabilistic basis for ranking protein hits.
Example of MMD issue
Peng et al (2003). Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast
proteome. J Prot Res 2, 43-50.
Reversed database sequence
Strategies to Visualize FDRs
False positive identification using reversed database
HSP-70 trypticpeptide
K ELEEIVQPIISK
(forward) (reverse)
K SIIPQVIEELEK
PeptideMr
1396.7813Da 1396.7813Da
Mascot checksbothpeptides
Theoretical y series Theoretical y series
y1
y2
y3
....
y11
147.1
234.1
347.2
....
1267.7
147.1
276.2
389.2
....
1309.7
Expected ions fromreversedhit should not correlate
with oberved ions onexperiment
All peptides Mascot
0
20
40
60
80
100
120
140
160
5 7 9 11 13 15 17 19 21 23 25
Seq lenght
Mas
cot
Sco
reTypical Result
Are there any Reversed hit protein with 2 peptides above MOWSE score?
-No: All proteins identified with 2 peptides score higher than p<0.05 are good
-Yes: Repeat mascot search with more stringent parameters.
What about 1-hit wonders? (Proteins identified with only 1 peptide)
How to Validate the Data
Basically, the idea is to ”play around” with the statistics to make your result more reliable.
All peptides Mascot
0
20
40
60
80
100
120
140
160
5 7 9 11 13 15 17 19 21 23 25
Seq lenght
Mas
cot
Sco
reHow to Validate the Data
Take home message
1. Data quality (mass accuracy) and a well-defined search space are key for reliable peptide identification
2. Reliable identification is an interplay between asking enough without asking too much (careful when trying to get “as many IDs as I can”!)
PTMs
October 2013
Gustavo de SouzaIMM, OUS
PTMs in biology
PTMs in biology
Complexity of Protein Samples in Eukaryotes
Modifications are specificto a group of amino acids
What difference to expect at MS level?
Larsen MR et al, 2006.
Defining the “Search Space”
PTM abundance in a cell
Total peptides in a sampleModified peptides
Num
ber
of
Pep
tides
Abundance level
Differences from 10e2 to 10e4
PTM abundance in a cell
Stable vs. Labile PTMs
Larsen MR et al, 2006.
Neutral loss
Boersema PJ et al, 2009.
Identifying Labile PTMs
Larsen MR et al, 2006.
HCD fragmentation
Larsen MR et al, 2006.
Status of PTM coverage
Lemeer and Heck, 2009.
Status of PTM coverage
Derouiche A et al, 2012.
Take home message
- Dependent on stability under fragmentation and abundance in the sample
- ID improvement was mostly defined by instrumentationimprovements (sensitivity etc)
- Depending on PTM, identification can be very easy or very hard