quantifying proteomes using the open- source trans ... · stpeter: the distributed nsaf (dnsaf)...
TRANSCRIPT
Quantifying Proteomes Using the Open-
source Trans-Proteomic Pipeline Michael Hoopmann
Jason Winget
Luis Mendoza
Robert Moritz
Shotgun Mass Spectrometry
Mass Spectrometer
Digestion / Separation
(HPLC)
Spectra Data
Analysis
Sample Instrument Discovery
More Detailed Look at Shotgun Data Analysis
Data Conversion 1
Spectrum Identification
2 ID Validation
3 Protein Inference
4
Quantification 5
Visualization 6
Dissemination 7
The Trans-Proteomic Pipeline (aka The TPP)
Raw Mass Spec Data
msconvert
Peptide Identification
Comet
X!Tandem
SpectraST
Kojak
SEQUEST*
Mascot*
Peptide Validation
PeptideProphet
iProphet
PTMProphet
Quantitation
ASAPRatio
XPRESS
Libra
StPeter
Protein Assignment
ProteinProphet
Protein List
SBEAMS
ProteoGrapher
pepXML protXML mzML
Simple set of input/outputs readable by all tools.
Modular flow – can swap algorithms into and out of pipeline.
Expandable – can add or remove tools as necessary.
Simple GUI – Operates in a web browser, multi-platform.
Versatility of The TPP
Data Converters
Open, Standardized
Formats
mzML mzXML
mzIdentML pepXML protXML
More!
TPP Applications
Multi-Platform Web Interface
Vendor & 3rd Party
Applications
Instrument Data From
Major Vendors
Search Validate Infer Quant.
Visualize Create
Reports Organize
Data Cloud
Publish
The Trans-Proteomic
Pipeline Suite of Software
The Trans-Proteomic Pipeline
Multiple user accounts.
Maintain independent projects and data storage.
All tools accessible from drop-down menu interface.
Interface has a few common pipelines pre-built.
The Trans-Proteomic Pipeline
All stages of pipeline accessible at any time, ordered for optimal performance.
Stages allow for customization.
Applications within pipeline can be re-run with different parameters to refine analyses.
The Trans-Proteomic Pipeline
StPeter: An application for label-free quantitation
StPeter – Label-free Quantitation in The TPP
Historically, quantitation in the TPP focused on labeled methods.
Label-free methods are often less work at the bench.
More recently, label-free approaches have become more robust.
StPeter is a new tool in the TPP for label-free quantitation.
Quantitation
ASAPRatio
XPRESS
Libra
StPeter
Labeled
methods, e.g.
iTRAQ & SILAC
New! Label-free!
StPeter: MS/MS-based Quantitation
Contains multiple MS/MS counting-centric methods.
Produces relative protein quantitation within a sample.
All results are normalized to facilitate comparisons across samples.
Spectral Counting
StPeter: The NSAF Model
Normalized Spectral
Abundance Factor (NSAF)
Contains two normalizations:
Protein length
Sample-to-sample variation
Small proteins produce fewer
distinct peptide molecules.
Large proteins produce many distinct
peptides, appearing more abundant
by total MS/MS count.
Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. J. Proteome Res. 2006, 5, 2339–2347.
StPeter: The SIN Model
Spectral counts are integral.
i.e. a scan from a low abundance
peptide has equal weight as a scan
from a high abundance peptide.
Normalized Spectral Index (SIN)
incorporates MS/MS peak height.
More abundant peptides produce
more abundant fragment ion peaks.
𝑆𝐼𝑁 = 𝑖𝑛𝑆𝑝𝐶𝑛=1
( 𝑖𝑛)𝑆𝑝𝐶𝑛=1 𝑗
𝑁𝑗=1
/ 𝐿
SIN adds fragment
ion intensity to NSAF.
Where i is the summed
intensities of fragment ions.
Griffin NM, et al. Nat Biotechnol. 2010 Jan;28(1):83-9
StPeter: The Distributed NSAF (dNSAF) Model
Modifies the NSAF model to
account for shared peptides.
i.e. peptides that map to multiple
protein sequences.
Utilizes the fraction of non-shared
peptides to split the spectral counts
of the shared peptides.
Protein A
Protein B
5 unique SpC
Distributed SpC = 6.4
2 shared SpC
2 unique SpC
Distributed SpC = 2.6
Zhang Y, et al. Anal Chem. 2010 Mar 15;82(6):2272-81
StPeter: The Distributed SIN (dSIN) Model
𝑑𝑆𝐼𝑁 = 𝑑𝑆𝐼/ 𝑑𝑆𝐼𝑗
𝑛
𝑗=1
/𝐿
Hoopmann MR, Winget JM, Mendoza L, Moritz RL. J Proteome Res. 2018 Mar 2;17(3):1314-1320.
Modifies the SIN model to account for shared peptides.
i.e. peptides that map to multiple protein sequences.
Utilizes the fraction of non-shared peptides to split the spectral counts of the shared peptides.
Modernizes spectral index analysis with the methods optimized for dNSAF.
Adding Quantitation to the Pipeline
Designed to be seamlessly
integrated into existing TPP
Pipelines.
Execution is FAST – typically
seconds to a few minutes.
Cassette model ensures addition
or removal without disrupting the
pipeline.
Spectral Search
ID Validation
Protein Inference
Visualization
StPeter
protXML
Read Protein
Passes FDR?
More Proteins?
Read Peptide List
Extract PSM list from pepXML
Extract Spectrum
More PSMs?
Add fragment ions to spectral index
Normalize Spectral Indexes
Export Spectral Indexes
pepXML
mzML / mzXML
yes yes
yes
no
no
no
How StPeter Works
Output is the same format as input, with protein quantities appended; Downstream tools operate the same.
A look at StPeter in action.
The Distributed Models
0.1 pmol 0.25 pmol
0.63 pmol
1.6 pmol
4 pmol
10 pmol
Cow Rat Human Rabbit Mouse Pig
• Mix six homologous
albumins in a constant
yeast background.
• Acquire 12 replicate
injections on an LTQ.
Zhang Y, et al. Anal Chem. 2010 Mar 15;82(6):2272-81
The Distributed Models
R² = 0.9871
-21
-20
-19
-18
-17
-16
-15
-14
-13
-12
-4 -2 0 2 4
Log2
(d
SIN)
Log2 (Protein Quantity, pmol)
dSIN
All spectra are used,
including identifications to
multiple proteins.
Quantitation maintains
linearity over 3 orders of
magnitude.
R² = 0.9719
-11
-10
-9
-8
-7
-6
-5
-4 -2 0 2 4Lo
g2 (
dN
SAF)
Log2 (Protein Quantity, pmol)
dNSAF
Quantifying Proteomes
Create two conditions:
60 µg HeLa + 10 µg E. coli
60 µg HeLa + 30 µg E. coli
24 OFFGEL fractions, analyzed in
triplicate (3x) on LTQ-Orbitrap
Total of 144 data files
Approximately 1 million spectra used
for quantitation
Compare ratios for every protein
observed in 2 of 3 replicates in both
conditions.
Over 5000 proteins
Cox J, et al. Mol Cell Proteomics. 2014 Sep;13(9):2513-26
Artifacts of Spectral Counts
Cox J, et al. Mol Cell Proteomics. 2014 Sep;13(9):2513-26
Tracks of discrete
quantities due to
single MS/MS protein
representation.
Decreased accuracy
at low quantities due
to limited number of
spectra per protein.
StPeter Analysis of Proteomes
-32
-28
-24
-20
-16
-2 0 2 4
Log2
(dSI
N)
Log2(ratio)
Human
E. coli
-22
-18
-14
-10
-6
-2 0 2 4
Log2
(dN
SAF)
Log2(ratio)
HumanE. coli
-4 -2 0 2 4 6
Re
lati
ve f
req
ue
nci
es
Log2(ratio)
-4 -2 0 2 4 6
Re
lati
ve f
req
ue
nci
es
Log2(ratio)
1.17 σ=0.72
-0.56 σ=0.60 1.73
1.08 σ=0.69
-0.60 σ=0.55
1.68
E.coli proteome ratio
much closer to 1:3 than
spectral counting alone.
Protein distributions
show lower standard
deviation.
Fewer artifacts among
low abundance
proteins.
dNSAF dSIN
Comparison to Other Methods
-32
-28
-24
-20
-16
-2 0 2 4
Log2
(dSI
N)
Log2(ratio)
Human
E. coli
-4 -2 0 2 4 6
Re
lati
ve f
req
ue
nci
es
Log2(ratio)
1.17 σ=0.72
-0.56 σ=0.60 1.73
dSIN
Comparison to precursor ion
analysis (MS signals).
Nearly identical protein ratios.
Protein distributions similar,
slightly better accuracy with MS
signal analysis.
StPeter in The TPP
Running StPeter in The TPP
Select one or more protein
inference data sets. Batch Analysis!
Set parameters in a simple
user interface
Analysis typically takes
seconds to a few minutes.
Visualizing StPeter Results
Protein Descriptions Search Results & Statistics Quantities
Visualizing StPeter Results
Tabbed windows allow for filtering and
sorting.
Extract only quantified proteins.
Sort results.
Expand protein details to the peptide
level.
Summary
The Trans-Proteomic Pipeline is a free, open-source suite of tools
for shotgun MS data analysis.
The TPP is multi-platform, modular, and supports open formats,
enabling integration with major platforms and 3rd party solutions.
StPeter offers fast, label-free quantitation of entire proteomes
analyzed using shotgun MS.
https://tppms.org
Acknowledgements Moritz Lab (circa 2015)
Robert Moritz
Eric Deutsch
Luis Mendoza
David Shteynberg
Jason Winget
2P50 GM076547/Center for Systems Biology
R01 GM087221
HL133135
Funding Sources: