quantifying proteomes using the open- source trans ... · stpeter: the distributed nsaf (dnsaf)...

30
Quantifying Proteomes Using the Open- source Trans-Proteomic Pipeline Michael Hoopmann Jason Winget Luis Mendoza Robert Moritz

Upload: others

Post on 02-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Quantifying Proteomes Using the Open-

source Trans-Proteomic Pipeline Michael Hoopmann

Jason Winget

Luis Mendoza

Robert Moritz

Page 2: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Shotgun Mass Spectrometry

Mass Spectrometer

Digestion / Separation

(HPLC)

Spectra Data

Analysis

Sample Instrument Discovery

Page 3: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

More Detailed Look at Shotgun Data Analysis

Data Conversion 1

Spectrum Identification

2 ID Validation

3 Protein Inference

4

Quantification 5

Visualization 6

Dissemination 7

Page 4: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

The Trans-Proteomic Pipeline (aka The TPP)

Raw Mass Spec Data

msconvert

Peptide Identification

Comet

X!Tandem

SpectraST

Kojak

SEQUEST*

Mascot*

Peptide Validation

PeptideProphet

iProphet

PTMProphet

Quantitation

ASAPRatio

XPRESS

Libra

StPeter

Protein Assignment

ProteinProphet

Protein List

SBEAMS

ProteoGrapher

pepXML protXML mzML

Simple set of input/outputs readable by all tools.

Modular flow – can swap algorithms into and out of pipeline.

Expandable – can add or remove tools as necessary.

Simple GUI – Operates in a web browser, multi-platform.

Page 5: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Versatility of The TPP

Data Converters

Open, Standardized

Formats

mzML mzXML

mzIdentML pepXML protXML

More!

TPP Applications

Multi-Platform Web Interface

Vendor & 3rd Party

Applications

Instrument Data From

Major Vendors

Search Validate Infer Quant.

Visualize Create

Reports Organize

Data Cloud

Publish

The Trans-Proteomic

Pipeline Suite of Software

Page 6: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

The Trans-Proteomic Pipeline

Multiple user accounts.

Maintain independent projects and data storage.

All tools accessible from drop-down menu interface.

Interface has a few common pipelines pre-built.

Page 7: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

The Trans-Proteomic Pipeline

All stages of pipeline accessible at any time, ordered for optimal performance.

Stages allow for customization.

Applications within pipeline can be re-run with different parameters to refine analyses.

Page 8: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

The Trans-Proteomic Pipeline

Page 9: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter: An application for label-free quantitation

Page 10: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter – Label-free Quantitation in The TPP

Historically, quantitation in the TPP focused on labeled methods.

Label-free methods are often less work at the bench.

More recently, label-free approaches have become more robust.

StPeter is a new tool in the TPP for label-free quantitation.

Quantitation

ASAPRatio

XPRESS

Libra

StPeter

Labeled

methods, e.g.

iTRAQ & SILAC

New! Label-free!

Page 11: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter: MS/MS-based Quantitation

Contains multiple MS/MS counting-centric methods.

Produces relative protein quantitation within a sample.

All results are normalized to facilitate comparisons across samples.

Spectral Counting

Page 12: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter: The NSAF Model

Normalized Spectral

Abundance Factor (NSAF)

Contains two normalizations:

Protein length

Sample-to-sample variation

Small proteins produce fewer

distinct peptide molecules.

Large proteins produce many distinct

peptides, appearing more abundant

by total MS/MS count.

Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. J. Proteome Res. 2006, 5, 2339–2347.

Page 13: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter: The SIN Model

Spectral counts are integral.

i.e. a scan from a low abundance

peptide has equal weight as a scan

from a high abundance peptide.

Normalized Spectral Index (SIN)

incorporates MS/MS peak height.

More abundant peptides produce

more abundant fragment ion peaks.

𝑆𝐼𝑁 = 𝑖𝑛𝑆𝑝𝐶𝑛=1

( 𝑖𝑛)𝑆𝑝𝐶𝑛=1 𝑗

𝑁𝑗=1

/ 𝐿

SIN adds fragment

ion intensity to NSAF.

Where i is the summed

intensities of fragment ions.

Griffin NM, et al. Nat Biotechnol. 2010 Jan;28(1):83-9

Page 14: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter: The Distributed NSAF (dNSAF) Model

Modifies the NSAF model to

account for shared peptides.

i.e. peptides that map to multiple

protein sequences.

Utilizes the fraction of non-shared

peptides to split the spectral counts

of the shared peptides.

Protein A

Protein B

5 unique SpC

Distributed SpC = 6.4

2 shared SpC

2 unique SpC

Distributed SpC = 2.6

Zhang Y, et al. Anal Chem. 2010 Mar 15;82(6):2272-81

Page 15: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter: The Distributed SIN (dSIN) Model

𝑑𝑆𝐼𝑁 = 𝑑𝑆𝐼/ 𝑑𝑆𝐼𝑗

𝑛

𝑗=1

/𝐿

Hoopmann MR, Winget JM, Mendoza L, Moritz RL. J Proteome Res. 2018 Mar 2;17(3):1314-1320.

Modifies the SIN model to account for shared peptides.

i.e. peptides that map to multiple protein sequences.

Utilizes the fraction of non-shared peptides to split the spectral counts of the shared peptides.

Modernizes spectral index analysis with the methods optimized for dNSAF.

Page 16: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Adding Quantitation to the Pipeline

Designed to be seamlessly

integrated into existing TPP

Pipelines.

Execution is FAST – typically

seconds to a few minutes.

Cassette model ensures addition

or removal without disrupting the

pipeline.

Spectral Search

ID Validation

Protein Inference

Visualization

StPeter

Page 17: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

protXML

Read Protein

Passes FDR?

More Proteins?

Read Peptide List

Extract PSM list from pepXML

Extract Spectrum

More PSMs?

Add fragment ions to spectral index

Normalize Spectral Indexes

Export Spectral Indexes

pepXML

mzML / mzXML

yes yes

yes

no

no

no

How StPeter Works

Output is the same format as input, with protein quantities appended; Downstream tools operate the same.

Page 18: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

A look at StPeter in action.

Page 19: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

The Distributed Models

0.1 pmol 0.25 pmol

0.63 pmol

1.6 pmol

4 pmol

10 pmol

Cow Rat Human Rabbit Mouse Pig

• Mix six homologous

albumins in a constant

yeast background.

• Acquire 12 replicate

injections on an LTQ.

Zhang Y, et al. Anal Chem. 2010 Mar 15;82(6):2272-81

Page 20: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

The Distributed Models

R² = 0.9871

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-4 -2 0 2 4

Log2

(d

SIN)

Log2 (Protein Quantity, pmol)

dSIN

All spectra are used,

including identifications to

multiple proteins.

Quantitation maintains

linearity over 3 orders of

magnitude.

R² = 0.9719

-11

-10

-9

-8

-7

-6

-5

-4 -2 0 2 4Lo

g2 (

dN

SAF)

Log2 (Protein Quantity, pmol)

dNSAF

Page 21: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Quantifying Proteomes

Create two conditions:

60 µg HeLa + 10 µg E. coli

60 µg HeLa + 30 µg E. coli

24 OFFGEL fractions, analyzed in

triplicate (3x) on LTQ-Orbitrap

Total of 144 data files

Approximately 1 million spectra used

for quantitation

Compare ratios for every protein

observed in 2 of 3 replicates in both

conditions.

Over 5000 proteins

Cox J, et al. Mol Cell Proteomics. 2014 Sep;13(9):2513-26

Page 22: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Artifacts of Spectral Counts

Cox J, et al. Mol Cell Proteomics. 2014 Sep;13(9):2513-26

Tracks of discrete

quantities due to

single MS/MS protein

representation.

Decreased accuracy

at low quantities due

to limited number of

spectra per protein.

Page 23: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter Analysis of Proteomes

-32

-28

-24

-20

-16

-2 0 2 4

Log2

(dSI

N)

Log2(ratio)

Human

E. coli

-22

-18

-14

-10

-6

-2 0 2 4

Log2

(dN

SAF)

Log2(ratio)

HumanE. coli

-4 -2 0 2 4 6

Re

lati

ve f

req

ue

nci

es

Log2(ratio)

-4 -2 0 2 4 6

Re

lati

ve f

req

ue

nci

es

Log2(ratio)

1.17 σ=0.72

-0.56 σ=0.60 1.73

1.08 σ=0.69

-0.60 σ=0.55

1.68

E.coli proteome ratio

much closer to 1:3 than

spectral counting alone.

Protein distributions

show lower standard

deviation.

Fewer artifacts among

low abundance

proteins.

dNSAF dSIN

Page 24: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Comparison to Other Methods

-32

-28

-24

-20

-16

-2 0 2 4

Log2

(dSI

N)

Log2(ratio)

Human

E. coli

-4 -2 0 2 4 6

Re

lati

ve f

req

ue

nci

es

Log2(ratio)

1.17 σ=0.72

-0.56 σ=0.60 1.73

dSIN

Comparison to precursor ion

analysis (MS signals).

Nearly identical protein ratios.

Protein distributions similar,

slightly better accuracy with MS

signal analysis.

Page 25: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

StPeter in The TPP

Page 26: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Running StPeter in The TPP

Select one or more protein

inference data sets. Batch Analysis!

Set parameters in a simple

user interface

Analysis typically takes

seconds to a few minutes.

Page 27: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Visualizing StPeter Results

Protein Descriptions Search Results & Statistics Quantities

Page 28: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Visualizing StPeter Results

Tabbed windows allow for filtering and

sorting.

Extract only quantified proteins.

Sort results.

Expand protein details to the peptide

level.

Page 29: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Summary

The Trans-Proteomic Pipeline is a free, open-source suite of tools

for shotgun MS data analysis.

The TPP is multi-platform, modular, and supports open formats,

enabling integration with major platforms and 3rd party solutions.

StPeter offers fast, label-free quantitation of entire proteomes

analyzed using shotgun MS.

https://tppms.org

Page 30: Quantifying Proteomes Using the Open- source Trans ... · StPeter: The Distributed NSAF (dNSAF) Model Modifies the NSAF model to account for shared peptides. i.e. peptides that map

Acknowledgements Moritz Lab (circa 2015)

Robert Moritz

Eric Deutsch

Luis Mendoza

David Shteynberg

Jason Winget

2P50 GM076547/Center for Systems Biology

R01 GM087221

HL133135

Funding Sources: