computational strategies for improving the quantitative analysis …€¦ · analysis of peptides...
TRANSCRIPT
Computational Strategies for Improving the Quantitative Analysis of Peptides by Tandem Mass Spectrometry
Michael MacCossDepartment of Genome Sciences
University of Washington
Acquisition methods in shotgun LC-MS/MS
Data Dependent Acquisition (DDA)
Liquid chromatography
MS/MS analysisIsolation &
fragmentationMS analysis
• Irregular sampling across replicates• Limited to MS1 quant
• 3-10x less sensitive than MS2 quant• Poor interference relative to MS2
• Need an MS1 signal to trigger MS2
Acquisition methods in shotgun LC-MS/MS
Data Dependent Acquisition (DDA)
Parallel Reaction Monitoring (PRM)
MS/MS analysisIsolation &
fragmentationMS analysis
Liquid chromatography
The Challenge:
• In 2006, software for building targeted
proteomics methods didn’t exist.
• Neither did software for the analysis of
complex multiplexed SRM methods
• No software was capable of cross
vendor data analysis
Needs:
Protein and Peptide centric
Intuitive document editor
graphical interface
Full undo/redo support
Direct analysis of all vendor
RAW MS data without
conversion.
Ability to choose peptides
and transitions based on
peptide spectrum libraries
Free and open source
Developed and supported
professionallyhttp://skyline.maccosslab.org
Skyline: Software for the Development of Targeted Proteomics Methods and Analysis of the Data
Skyline is Vender Neutral
• Paulovich LabAppliedBiosystemsQTrap 4000
• MacCoss LabThermoTSQ Ultra
Skyline Project Overview
• 10th year in development
• 6 developers
• over 500,000 lines of code – open source
• 60,000 lines of code for testing
• over 85,000 installations
• over 10,000 registered users
• over 7500 instances started each week
Skyline is Not Just for SRM
Support Targeted Quantitation from Extracted Ion Chromatograms:
• Selected Reaction Monitoring (aka SRM and MRM)
• Targeted MS/MS (aka Parallel Reaction Monitoring or HMRM)
• MS1 Filtering (chromatogram extraction from the MS1 data)
• Data Independent Acquisition and SWATH
Immediate Visual Interaction with the Data
129 peptides, 42 replicates
External Tools for Skyline
External Tools – 13 Total
Msstats (Vitek)
grouped study statistics
QuaSAR (Carr)
response curve statistics
Population Variation
mutation frequency
Protter (Wollscheid)
transmembrane topology
SProCoP (Bereman)
system suitability
MS1 Probe (Gibson)
MS1 filtering statistics
Broudy, et. al, Bioinformatics
Tools in Skyline
Skyline Tester
Critical tool for professional translation companies
Automated access to any of 140 forms in Skyline
Automated access to 14 tutorials and 400 screenshots
Runs English, Chinese and Japanese UI
Runs French and Turkish number formats
Nightly runs tracking failures and memory leaks
Leveraging already extensive Skyline test suite
530 tests
60,000 lines of code
Custom Test Framework -- Skyline Tester
Custom Test Framework -- Skyline Tester
Memory Leak
Skyline Nightly – Test Web Log
Aggregating and Publishing• Store and manage Skyline documents
• Publish fully annotated Skyline documents
o Satisfies the newest MCP Guidelines for dissemination of targeted proteomics assays (Abbatiello et al, MCP 2017)
• Build chromatogram libraries
• Aggregate lab QC data
• Free hosted version (http://panoramaweb.org)
o 331 labs so far (including large projects from LINCS, CPTAC, & ABRF sPRG)
o >9589 data sets uploaded
o User controlled security
• Supported by a grant from the NIH and the Panorama Partners Program
o Roche (2013), Genentech (2013), Merck (2014), CONFIDENTIAL (2014), NCI (2017), and Sanofi 2018
• Panorama Public fulfills the MCP publication requirements for release of quantitative data.
• Locally installable server application
• Free and open source (Apache 2.0)Sharma, et. al, J. Proteome Res. 2014
Skyline/Panorama Workflow
Publishing to Panorama
Growth of PanoramaWeb
Lab Growth to 331 Skyline docs to 14568
Panorama Growth High
Viewing Runs - Peptides
Viewing Proteins
Viewing Peptides
Customizing Views, Sorts, Filters
Grouth Partially Enabled by AutoQC
Vendor Neutral AutoQC
Data Uploaded Automatically to Panorama
Track Performance Overtime
Backend to NIH Branded Sites
NCI CPTAC Assay Portal LINCS pilincs.org
http://passport.maccosslab.org
Alternatives to SRM or Targeted MS/MS?
• We want the selectivity of targeted SRM with the breadth of data dependent acquisition.
• If we have a new peptide we want to target, we don’t want to have to go back and rerun the experiment.o We want a digital molecular archive of
all samples.
Acquisition methods in shotgun LC-MS/MS
Data Independent Acquisition (DIA)
Data Dependent Acquisition (DDA)
Parallel Reaction Monitoring (PRM)
MS/MS analysisIsolation &
fragmentationMS analysis
Liquid chromatography
Data Independent Acquisition
20 20 m/z-wide windows = 400 m/z
m/z500 900
Tim
e
~2 seconds (Q-Exactive)
m/z500 900
780
m/z
800
m/z
51.2
51.351.4
51.551.6
51.7
400 600 800 1000 1200 1400m/z
0
1
2
3
4
Inte
nsi
ty x
10
-5
Targeted Chromatogram ExtractionVLENTFEIGSDSIFDK++ (790.4 m/z)
Extraction of Fragment Ions from DIA Data
Venable et al Nat Methods 2004
Dong et al Science 2007
Data Acquisition
Peptide Selection
Chromatogram Extraction
Peak Integration
Transition Refinement
Hypothesis Generation
Hypothesis Test
Data Acquisition
Peptide Selection
Hypothesis Generation
Hypothesis Test
Transition Selection or Refinement
RT Scheduling
Peak Integration
DIASRM
The Promise of DIA
▪ Acquire a “molecular image” or “digital archive” of the sample– Mine it over and over
– No scheduling
▪ Direct queries (and p-values) for peptides of interest
▪ Better quantitation than DDA
Source: http://edition.cnn.com/interactive/2017/01/politics/trump-inauguration-gigapixel/
PRM: High Fidelity; Targeted
Source: http://edition.cnn.com/interactive/2017/01/politics/trump-inauguration-gigapixel/
There Can be Value in Keeping the Full Picture
Source: https://www.nytimes.com/interactive/2017/01/20/us/politics/trump-inauguration-crowd.html
2009 Obama Inauguration 2017 Trump Inauguration
There Can be Value in Keeping the Full Picture
Things you can do post data acquisition
• Choose different peptides for a protein
• Choose different transitions
• Add additional proteins / proteoforms
Peptide Detections: Typical DIA library search workflow
Spectrum
LibraryCompute
Match
Features
Machine
Learning
ClassifierWide
Window
DIA File
Typical DIA library search
Peptide Detections: Typical DIA library search workflow
Compute
Match
Features
Machine
Learning
ClassifierWide
Window
DIA File
Typical DIA library search
Innovations
Protein/
Peptide
Sequences
Pecan: Detecting Peptides Directly from DIA Data
Percolator, Käll et al. 2007Decoy peptides
CVITNLAPTK
p-value
q-valuePercolator
APTNVTCILK
Peptides of interest
Fragment ions - raw score
- background subtracted score
- spectrum similarity
- number of contributing ions
- mass error mean
- mass error variance
- rank
Precursor ions- idotp (isotope dot product)
- mass error mean
- mass error variance
Peptide sequence- charge state
- peptide length
- SSR hydrophobicity index
Score
Ting et al, Nature Methods 2017
XcorDIA is a Pecan Like Algorithm that Uses Xcorr
Brian Searle ASMS 2018
DIA is a powerful tool for detecting peptides in HeLa
192
Experimental Details:
90 min linear gradient
QE-HF single injection
DDA:
400-1600 m/z
Top-12
DIA:
400-1000 m/z
24 m/z overlap
windows
(12 m/z effective)
Peptide Detections: EncyclopeDIA workflow
Spectrum
Library
Deconvolut
e
Overlappin
g
Windows
Compute
Match
Features
Percolator
Wide
Window
DIA File
Retention
Time Filtering
Automated
Transition
Refinement
Quantitation
Typical DIA library search
EncyclopeDIA innovations
Searle et al. https://www.biorxiv.org/content/early/2018/03/07/277822.article-info
The Promise of DIA
Precursor Sampling in DIA is Low Res
Spectrum Library Approach
Query Data DIA Data
DIA is a powerful tool for detecting peptides in HeLa
HeLa Spectrum Library
• 39 Injections
• ~166.4k Peptides
Pan-Human Spectrum
Library
• 331 Injections
• ~139k Peptides
Separating Detection from Quantitation
24 m/z overlapping12 m/z effective
4 m/z overlapping2 m/z effective
Searle et al Biorxiv 2018
EncyclopeDIA workflow
Chromatogram
Library
Deconvolut
e
Overlappin
g
Windows
Compute
Match
Features
Percolator
Wide
Window
DIA File
Retention
Time Filtering
Automated
Transition
Refinement
Quantitation
Typical DIA library search
EncyclopeDIA innovations
Chromatogram libraries
FASTA
or
Library
Narrow
Window
DIA File
PECAN/
EncyclopeDIA
DIA Chromatogram Library Approach
DIA Chromatogram Library Approach
Query Data
DIA Data
Chromatogra
m Library
DIA Chromatogram Library Approach
Query Spectrum
Chromatogra
m LibraryNot Found
DIA Chromatogram Library Approach
Chromatogram
Library DIA Data
On-column
Chromatogram
Library
Normalized Retention
Time Library
Negative Positive Delta RT
- 1 0 - 8 - 6 - 4 - 2 0 2 4 6 8 1 0
Delta RT
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
100
Co
un
t
Negative Positive Delta RT
- 1 0 - 8 - 6 - 4 - 2 0 2 4 6 8 1 0
Delta RT
0
100
200
300
400
500
600
700
800
900
1,000
1,100
1,200
Co
un
t
+/- 5 min +/- 10 sec
Chromatogram Library Intensities are Better
Correlated with DIA Data
DDA Library vs Wide DIA DataDIA Chromatogram Library vs
Wide DIA Data
DIA is a powerful tool for detecting peptides in HeLa
Chromatogram DIA Library • 6 Injections
HeLa Chromatogram Library from FASTA • 53.2k Peptides
DIA is a powerful tool for detecting peptides in HeLa
Chromatogram DIA Library • 6 Injections
HeLa Chromatogram Library from FASTA • 53.2k Peptides
HeLa Chromatogram Library from HeLa Spectrum Library • 99.6k Peptides
HeLa Chromatogram Library from Pan-Human Spectrum Library • 91.1k Peptides
Advantages of On-Column Chromatogram Library
• If you can’t detect a peptide with a narrow precursor window then you will never detect it with a wide-window.
• On-column RT calibration overcomes selectivity limitations of poor precursor isolation in DIA
• Simplifies workflow. No need to perform extensive fractionation to generate sample specific library data. Just 4-6 runs with narrow windows
• Can make use of large assay spectrum libraries if available.
• Reduces multiple testing. Only look for peptides you know are in the sample in the wide window data.
http://chorusproject.org
Managed by Stratus Biosciences – A Non-Profit Company
New Paradigm for Data Processing
• Proteomics data needs to be “un-siloed”– A single place for all data
• Volume of high value data is increasing• Crucial for peer review and critical analysis• Economics are driving demand for data
reuse• Cultural shift: Public access to data
collected with public funds• We need to bring algorithms to the data
Chorus Project
http://chorusproject.org
Sharing and Dissemination of RAW Data
67
Vendor Neutral Chromatogram and Spectrum Viewer
Instruments and Operators
Automatic Upload / Desktop Client
Link
Each instrument has an automatic client uploader
Organizing Data
Support Wiki Pages for Projects
Sharing Data and Projects
Conclusions
• Software for mass spectrometry analysis must be easy to use, well supported, well documented, well tested, and robust.
• We feel strongly about the power of being able to visualize data.
• Data sharing continues to be a problem and it will continue to get worse as the information content of mass spectrometry data gets larger.
75
Financial SupportNCI-IMAT
NIGMS BTRR P41 Program
NIA Nathan Shock Center
NIA Investigator Initiated R01 and P01
NIGMS Investigator Initiated R01s
Stratus BiosciencesAndrey BonderenkoChristine WuNathan Yates
University of WashingtonDepartment of Genome
SciencesBill NobleJohn StamatoyannopolousJudit Villen
KTH StockholmLukas Kall
UW Lab MedicineAndy HoofnagleGeoff BairdClark Henderson
Former Lab MembersJason GilmoreSandi Spencer
University of Washington
MacCoss LabJosh Aldrich
Nat Brace
Yuval Boss
Brian Connolly
Danielle Faivre
Austin Keller
Eric Huang
Rich Johnson
Brendan MacLean
Gennifer Merrihew
Brook Nunn
Lindsay Pino
Deanna Plumbell
Brian Pratt
Tobias Rohde
Paul Rudnick
Brian Searle
Vagisha Sharma
Nick Shulman
Emma Timmins-Schiffman
Stanford Tom Montine
ThermoFisherMary BlackburnRomain HuguetAndreas KuehnClaudia MartinsPhil RemesMike SenkoYue XuanVlad Zabrouskov
Acknowledgement
Software Vendor SupportAgilent Shimadzu
Bruker ThermoFisher
Sciex Waters
MacCoss Lab UW