how to identify peptides

How to identify peptides

October 2013

Gustavo de SouzaIMM, OUS

Peptide or Proteins?

Bottom-up Proteomics

2DE-based approach

Peptide Mass Fingerprinting

MALDI (Matrix Assisted Laser Desorption Ionization)

Peptide Mass Fingerprinting

m/z

Inte

nsity

MS/MS

IonSource

MassAnalyzer

DetectorMass

AnalyzerMass

Analyzer

Collision cell

MS/MS

899.013

899.013

899.013

Fragmentation

Nomenclature for peptide sequence-ions:

Collision-Induced Dissociation (CID): MHn

n+* + N2 --> b + y

Electron Capture Dissociation (ECD): MHn

n++ e- --> MHn(n-1)+· --> c + z·

Fragmentation

H2NN H

H N

N H

H N

N H

R1

R2

R3

R4

R5

H N

R6

N H

R7

R8O

O

O

O

O

O

O

O

OH

y7

b1

y6

b2

y4

b4

y5

b3

y2

b6

y3

b5

y1

b7

Roepstorff-Fohlmann-Biemann-Nomenclature

Fragmentation

12 aa

… …

b ions y ions

MS/MS of a peptideLG_y2_13 #11793 RT: 84.81 AV: 1 NL: 3.57E5T: ITMS + c ESI d w Full ms2 [email protected] [ 190.00-1485.00]

200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lativ

e A

bu

nd

an

ce

P y13

y12y11

y10

y9

y8

y7

y6

y5y4

y3

y2 b13b12b11

b10

b9

b8

b7

b6

b5

b4b3

P y++13VPTVDVSVVDLTVK

How to Identify MS/MS

Stenn and Mann, 2004.

Peptide Sequence Tags

Autocorrelation

Probability based match

Submitting to Search

How identification happen?

Your data Protein database (fasta)

Step 1: which theoretical peptides has the same mass of the observed ion?

Step 2: From those, which one have the most similar fragmentation pattern?

x x x

High mass accuracy – what is it good for?

All theoretical tryptic peptide masses from human IPI database

Example Tryptic HSP-70 peptide: ELEEIVQPIISK, mass 1396.7813 Da

1111

Ext.Ext.

2 ppm2 ppm

LTQ-FTLTQ-FT

9933335252344344# of tryptic # of tryptic peptides for peptides for m/z m/z 1396.78131396.7813

Ext-SIMExt-SIMInt.Int.Ext.Ext.Ext.Ext.CalibrationCalibration

1 ppm1 ppm10 ppm10 ppm20 ppm20 ppm500500Mass Mass AccuracyAccuracy

LTQ-FTLTQ-FTQSTARQSTARQSTARQSTARLTQLTQInstrumentInstrument

33

Int.Int.

0.5 ppm0.5 ppm

LTQ-FTLTQ-FT

Defining the “Search Space”

The “Search Space”

0 mcl

12 34 5

6

1/2

12 34 5

6

2/3

3/44/5

5/6

1 mcl

1/2

12 34 5

6

2/3

3/44/5

5/6

2 mcl

1/2/3

2/3/43/4/5

4/5/6

Importance of Search Space Size

Search tool does not identify a peptide. It only reports the statiscally most suitable theoretical sequence related with the experimental data.

If you increase the size of the database too much, or the size of the search space, false-positive rates also

increase.

Steen and Mann, 2004

Defining FDRs

Chance that two peptides with different sequences but approximate Mr and sharing MS/MS similarities.

More variables inserted during search Higher chance to get random events Higher MOWSE score threshold

Parameters that can modify the MOWSE calculation:

-Database size;

-MMD (measured mass deviation);

-Number of PTMs choosen;

-Data quality.

MOWSE

Mycoplasma sp. sample (Munich 2006):

-Database had ~ 700 entries;

-Data accuracy had 0.7ppm average;

-MMD used during search: 3 ppm.

Probability Based Mowse ScoreIons score is -10*Log(P), where P is the probability that the observed match is a random event.Individual ions scores > 7 indicate identity or extensive homology (p<0.05).Protein scores are derived from ions scores as a non-probabilistic basis for ranking protein hits.

Example of MMD issue

Peng et al (2003). Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast

proteome. J Prot Res 2, 43-50.

Reversed database sequence

Strategies to Visualize FDRs

False positive identification using reversed database

HSP-70 trypticpeptide

K ELEEIVQPIISK

(forward) (reverse)

K SIIPQVIEELEK

PeptideMr

1396.7813Da 1396.7813Da

Mascot checksbothpeptides

Theoretical y series Theoretical y series

y1

y2

y3

....

y11

147.1

234.1

347.2

....

1267.7

147.1

276.2

389.2

....

1309.7

Expected ions fromreversedhit should not correlate

with oberved ions onexperiment

All peptides Mascot

0

20

40

60

80

100

120

140

160

5 7 9 11 13 15 17 19 21 23 25

Seq lenght

Mas

cot

Sco

reTypical Result

Are there any Reversed hit protein with 2 peptides above MOWSE score?

-No: All proteins identified with 2 peptides score higher than p<0.05 are good

-Yes: Repeat mascot search with more stringent parameters.

What about 1-hit wonders? (Proteins identified with only 1 peptide)

How to Validate the Data

Basically, the idea is to ”play around” with the statistics to make your result more reliable.

All peptides Mascot

0

20

40

60

80

100

120

140

160

5 7 9 11 13 15 17 19 21 23 25

Seq lenght

Mas

cot

Sco

reHow to Validate the Data

Take home message

1. Data quality (mass accuracy) and a well-defined search space are key for reliable peptide identification

2. Reliable identification is an interplay between asking enough without asking too much (careful when trying to get “as many IDs as I can”!)

PTMs

October 2013

Gustavo de SouzaIMM, OUS

PTMs in biology

Complexity of Protein Samples in Eukaryotes

Modifications are specificto a group of amino acids

What difference to expect at MS level?

Larsen MR et al, 2006.

Defining the “Search Space”

PTM abundance in a cell

Total peptides in a sampleModified peptides

Num

ber

of

Pep

tides

Abundance level

Differences from 10e2 to 10e4

PTM abundance in a cell

Stable vs. Labile PTMs


Neutral loss

Boersema PJ et al, 2009.

Identifying Labile PTMs


HCD fragmentation


Status of PTM coverage

Lemeer and Heck, 2009.

Status of PTM coverage

Derouiche A et al, 2012.

Take home message

- Dependent on stability under fragmentation and abundance in the sample

- ID improvement was mostly defined by instrumentationimprovements (sensitivity etc)

- Depending on PTM, identification can be very easy or very hard

how to identify peptides

Documents

high mass accuracy

search higher chance

peptide sequenceions

tryptic peptides

theoretical peptides

database sizemmd

suitable theoretical

mhnn e