Download - Reporting Protein Identifications from MS/MS Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA [email protected] Creative

Reporting Protein Identifications from MS/MS Results

Brian C. SearleProteome Software Inc.

Portland, Oregon USA

[email protected]

Creative Commons Attribution

Outline

• Assigning Proteins from Peptide IDs

• Correcting for One-Hit-Wonders

• Protein False Discovery Rates?

• Correcting for Shared Peptides

• Publication Standards

Just to Review:

clearlywrong

possiblycorrect

F

R

Elias JE, Gygi SP.Nat Methods. 2007 Mar;4(3):207-14.

Just to Review:# Spectrum Accession Peptide Score

1 scan 3632 P35908 GFSSGSAVVSGGSR 4.6

2 scan 3609 P0AFY8 FSAASQPAAPVTK 3.7

3 scan 3629 P0A940 GFQSNTIGPK 3.0

4 scan 3635 P0A6F9 STRGEVLAVGNGR 2.2

5 scan 3636 P0A870 ELAESEGAIER 2.1

6 scan 3607 P0A799 ADLNVPVKDGK 1.9

7 scan 3626 P0ABC7 EAEAYTNEVQPR 1.6

8 scan 3602 P0A853 IRVIEPVKR 1.4

9 scan 3623 P38489 KLTPEQAEQIK 0.9

10 scan 3616 P00448 GTTLQGDLK 0.8

11 scan 3621 P09546 LLPGPTGER 0.4

12 scan 3615 P0AFG8 AFLEGR 0.2

13 scan 3624 P14565 SAADVAIMK 0.0

14 scan 3613 rev_P06864 EGSLAVNVQGDAAIR -0.4

15 scan 3604 P36562 DPEEVVGIGANLPTDK -0.7

16 scan 3606 P0A9C5 IPVVSSPK -0.7

17 scan 3611 P0ABB0 ASTISNVVR -0.7

18 scan 3614 rev_Q2EEU2 KFVALTCDTLLLGER -0.8

19 scan 3620 rev_P0ACL5 NNESAALMKEYCR -0.9

20 scan 3633 rev_P37309 SDGSCNQRALNR -0.9

21 scan 3627 P32132 VEETEDADAFRVSGR -1.0

22 scan 3618 P37342 ILTQDEIDVR -1.0

23 scan 3610 rev_P0ADK0 IANVSDVVPR -1.2

24 scan 3601 P0AG93 LGMKREHMLQQK -1.3


























?

…Well, Maybe

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

85%

65%

25%

??%

FDRs for Whole Datasetsvs Individual Peptides

• Cumulative FDRs only estimate the validity of a data set

• Probabilities (or instantaneous FDRs) estimate the validity of a peptide of interest

One Possible Approach• Instantaneous False Discovery Rate

• PeptideProphet (TPP, Scaffold)• Percolator• Spectral Energies• RAId De Novo

Many Others:


























4 to 53 to 4

2 to 3

1 to 2

0 to 1

-1 to 0

-2 to -1

# of

Mat

ches

0

100

200

300

400

500

600

700

800

-40 -30 -20 -10 0 10 20 30 40 50 60

“Correct”

Ion Score – Identity Score

“2x Decoy”

Histogram of Decoy Matches

# of

Mat

ches

0

100

200

300

400

500

600

700

800

-40 -30 -20 -10 0 10 20 30 40 50 60

“Correct”


Histogram of Decoy Matches“2x Decoy”

# of

Mat

ches


Curve Fit Distributions

0

100

200

300

400

500

600

700

800

-40 -30 -20 -10 0 10 20 30 40 50 60

“2x Decoy”

“Correct”

Choi H, Ghosh D, Nesvizhskii AI.J Proteome Res. 2008 Jan;7(1):286-92.

0

100

200

300

400

500

600

700

800

-40 -30 -20 -10 0 10 20 30 40 50 60

Instantaneous FDR Method#

of M

atch

es

“Correct”

“2x Decoy”


p( | D)

p(D | ) p()

p(D | ) p() p(D | ) p( )

Choi H, Ghosh D, Nesvizhskii AI.J Proteome Res. 2008 Jan;7(1):286-92.

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

85%

65%

25%

??%

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

(15%)

(35%)

(75%)

(??%)

Feng J, Naiman DQ, Cooper B.Anal Chem. 2007 May 15;79(10):3901-11.

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

(15%)

(35%)

(75%)

(4%)

0.15 * 0.35 * 0.75 = 0.04Feng J, Naiman DQ, Cooper B.Anal Chem. 2007 May 15;79(10):3901-11.

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

85%

65%

25%

96%

0.15 * 0.35 * 0.75 = 0.04Feng J, Naiman DQ, Cooper B.Anal Chem. 2007 May 15;79(10):3901-11.

If only it were so easy!

Peptide 1

Peptide 2

Peptide 3

Peptide 4

Peptide 5

Peptide 6

Peptide 7

Peptide 8

Peptide 9

Peptide 10

80% Peptides

Peptide 1

Peptide 2

Peptide 3

Peptide 4

Peptide 5

Peptide 6

Peptide 7

Peptide 8

Peptide 9

Peptide 10

CorrectProtein A

CorrectProtein B

80% Peptides

Peptide 1

Peptide 2

Peptide 3

Peptide 4

Peptide 5

Peptide 6

Peptide 7

Peptide 8

Peptide 9

Peptide 10

CorrectProtein A

CorrectProtein B

IncorrectProtein C

IncorrectProtein D

80% Peptides 50% Proteins

One hit wonders aredubious at best

Outline






Computed Probability

Actu

al P

roba

bilit

y

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658


Actu

al P

roba

bilit

y

UNDERestimation

OVERestimation



UNDERestimation

OVERestimation


Actu

al P

roba

bilit

y

What if we could scoreone-hit-wonderness?


Combining different peptides

• Quantify as a score:If different peptides agree: Good!If peptides are one-hit-wonders: Bad!




• Peptide agreement score:

'

'

( | )k k

k k

NSP p D




• Peptide agreement score:

'

'

( | )k k

k k

NSP p D

NSP score for peptide (k) is the sum of other

agreeing peptides (not k)Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Protein Prophet Distributions

Multi-hitProteins

One-hitWonders


in between(keep same)

one hit wonders(decrease prob)

multi-hit proteins(increase prob)

UNDERestimation

OVERestimation


Actu

al P

roba

bilit

y



Actu

al P

roba

bilit

y

with NSP

without NSP


Brian, I hate math.What do I do?

Option 1:Throw Out One-Hit-Wonders

Advantages: Easy, works!

Disadvantages: Loss of sensitivity!

Option 2: Use Multiple FiltersFilter 1 - Protein Mode

• ≥2 peptides/protein• moderate spectrum threshold

Filter 2 - Peptide Mode• 1 peptide/protein• high spectrum threshold

Option 2: Use Multiple Filters

Advantages: More sensitive!

Disadvantages: Pretty arbitrary!

Option 3:






# Accession Protein Score

1 P0ABH7 4258.08

2 P0ABJ9 2423.84

3 P0A7S3 1670.86

4 P0ACF0 1230.35

5 P0AES0 896.12

6 P21165 702.89

7 P0AG59 524.04

8 P17952 409.74

9 P08997 327.85

10 rev_P76577 276.03

11 P41407 246.88

12 P39177 219.44

13 P37689 195.37

14 P0A951 177.02

15 P0AGG4 164.52

16 P29131 153.92

17 rev_P0AEQ1 146.86

18 rev_P09155 140.07

19 P0A9S5 132.29

20 P0AE45 125.41

21 P77718 120.12

22 P76115 116.15

23 rev_P76463 111.37

24 rev_P0A6E4 107.58

Protein FDRs only accurate with >100 Proteins

Number of Confidently IDed Proteins

Unc

erta

inty

in P

rote

in F

DR

1% Error In FDR Estimation

Histogram of Decoy PROTEIN Matches

Protein Score

# Pr

otei

n Id

entifi

catio

ns

“Correct”

“2x Decoy”

Instantaneous Protein FDRs…

• Estimate the likelihood that a single protein of interest is present

• Are trouble at best due to stochastic sampling

• Shouldn’t be used with <500 likely proteins– Better off calculating protein probabilities using a

model like ProteinProphet

Proteins don’t existin isolation

Outline






Nesvizhskii, A. I.; Aebersold, R. Mol. Cell. Proteom. 4.10, 1419-1440, 2005

Tubulinalpha 6

Tubulinalpha 3

YMACCLLYR

Tubulinalpha 4

85%

??%

??%

??%

Tubulinalpha 6

Tubulinalpha 3

YMACCLLYR

Tubulinalpha 4

85%

85%3

85%3

85%3Nesvizhskii, A. I.; Keller, A. et al

Anal. Chem. 75, 4646-4658

Tubulinalpha 6

Tubulinalpha 3

YMACCLLYR

SIQFVDWCPTGFK

Tubulinalpha 4

??%

??%

??%

Tubulinalpha 6

Tubulinalpha 3

YMACCLLYR

SIQFVDWCPTGFK

Tubulinalpha 4

Peptide 1 Peptide 2

Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

Distinct Proteins

100% 100%

100% 100%

Peptide 1 Peptide 2 Peptide 3 Peptide 4


Prot

ein

BPr

otei

nA

Indistinguishable Proteins

50% 50% 50% 50%

50% 50% 50% 50%

Peptide 1 Peptide 2 Peptide 3


Prot

ein

BPr

otei

nA

Differentiable Proteins

100% 50% 50%

50% 50% 100%



Prot

ein

BPr

otei

nA

Subset Proteins

100% 100% 100% 100%

0% 0% 0%

Indistinguishable

Differentiable

Subset



Prot

ein

BPr

otei

nA

The QuantitativeSubset Complication



Prot

ein

BPr

otei

nA

The QuantitativeSubset Complication

?

EAFIDHGEEFSGR GSFPMAEK

NLGMGK

Specific to 2c29Specific to 2c40 Common to both

Ratio ≈ 1.1

P450 2c40 P450 2c29

Ratio ≈ 1.6 Ratio ≈ 2.2

The Hidden Subset Complication

Peptide 1

Prot

ein

BPr

otei

nA Peptide 2

Peptide 3Peptide 2

Peptide 3 Peptide 4

Prot

ein

C


Peptide 1

Prot

ein

BPr

otei

nA Peptide 2

Peptide 3Peptide 2

Peptide 3 Peptide 4

Prot

ein

C

100%

100%


Peptide 1

Prot

ein

BPr

otei

nA Peptide 2

Peptide 3Peptide 2

Peptide 3 Peptide 4

Prot

ein

C

100% 100%

0% 0%

100%

100%

The Bold Red Complication

Peptide 1

Prot

ein

BPr

otei

nA Peptide 2 Peptide 3 Peptide 4



Peptide 1

Prot

ein

BPr

otei

nA

100%



100% 100%

100%

0% 0% 100%


Peptide 1

Prot

ein

BPr

otei

nA

100%



100% 100%

100%

0% 0% 100%

?


Peptide 1

Prot

ein

BPr

otei

nA Peptide 2 Peptide 3 Peptide 4


Protein Identification Unique Peptides TrustFamily of A and B 5 Unique, 5

TotalHigh

•Definitive ID of Protein A 2 Unique, 4 Total

Med

•Definitive ID of Protein B 1 Unique, 3 Total

Low

The Similar Peptide Complication

AVGNLR

Scan Number: 2435

GLGNLR


AVGNLR

Scan Number: 2435 TLR9_HUMAN

GLGNLR

TRFE_HUMAN

LRFN1_HUMAN


AVGNLR

Scan Number: 2435 TLR9_HUMAN

TRFE_HUMAN

LRFN1_HUMAN

No software deals withall of these issues

Outline






Publication Standards

• In 2006 MCP published guidelines for reporting peptide and protein identifications

• Other proteomics journals have adopted similar standards

• Revised “Paris 2” guidelines are forthcoming Expected to be enforced 1/1/2010!

Guidelines remind you:• To present a complete methods/results section

I. Search Parameters and Acceptance CriteriaVI. Raw Data Submission



• Follow smart criteria for choosing results to publish

II. Protein and Peptide IdentificationIV. Protein Inference from Peptide AssignmentsV. Quantification



• Follow smart criteria for choosing results to publish

II. Protein and Peptide IdentificationIV. Protein Inference from Peptide AssignmentsV. Quantification

• To not over-report your resultsIII. Post-Translational Modifications

Software Can MakeGuideline Fulfillment Easier

• Peak picking software, version, altered parameters

• Database Selection– Database name and version

– Species restriction

– Number of proteins searched

• Database search parameters– Search engine name and version

– Enzyme specificity

– # missed cleavages

– Fixed/variable modifications

– Mass tolerances

• Peptide selection criteria

XML Standards Can Make Guideline Fulfillment Easier

I. Search Parameters and Acceptance Criteria

II. Protein and Peptide Identification

III. Post-Translational Modifications

IV. Protein Inference from Peptide Assignments

V. Quantification

VI. Raw Data Submission

mzIdentML

mzMLhttp://www.psidev.info/

Where are they?

http://www.mcponline.org/misc/ParisReport_Final.dtl

Molecular & Cellular Proteomics: Bradshaw, R. A., Burlingame, A. L., Carr, S., Aebersold, R., Reporting Protein Identification Data: The next Generation of Guidelines. Mol. Cell. Proteomics, 5:787-788, 2006.

Journal of Proteome Research: Beavis, R., Editorial: The Paris Consensus. J. Proteome Res., 2005, 4 (5), p 1475

Proteomics: Wilkins, M. R., Appel, R. D., Van Eyk, J. E., Maxey, C. M., et al., Guidelines for the next 10 years of proteomics. Proteomics. 2006, 6, 1, 4-8.

http://www.mcponline.org/misc/ParisReport_Final.dtl

Conclusions• We identify Proteins (not Peptides)!

– Can’t stop at Peptide FDRs and Probabilities



• One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically)




• You can compute Protein level FDRs– But take them with a grain of salt!





• Occam’s Razor can simplify Shared Peptides





• Occam’s Razor can simplify Shared Peptides

• Publication Standards exist to help you

Download - Reporting Protein Identifications from MS/MS Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA [email protected] Creative

Top Related