openms: quantitative proteomics at large scale

28
Yasset Perez-Riverol Ph.D github: github.com/ypriverol twitter: @ypriverol OpenMS: Quantitative proteomics at large scale

Upload: yasset-perez-riverol

Post on 16-Jan-2017

171 views

Category:

Science


0 download

TRANSCRIPT

Page 1: OpenMS: Quantitative proteomics at large scale

Yasset Perez-Riverol Ph.Dgithub: github.com/ypriveroltwitter: @ypriverol

OpenMS: Quantitative proteomics at large scale

Page 2: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Outline• Introduction to OpenMS

Modularity & Workflows

Visualization.

Integration with other tools.

• Two example workflows

Protein identification

Label-free quantification

Page 3: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Modularity is the degree to which a system's components may be separated and recombined.

Page 4: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Page 5: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Modularity

tools for identification

DecoyDatabaseMascotAdapterXTandemAdapterMSGFPlusAdapterPeptideIndexerFalseDiscoveryRateIDPosteriorErrorProbabilityConsensusIDLuciphorAdapterHighResPrecursorMassCorrectorFidoAdapter

tools for quantification

PeakPickerHiResFeatureFinderMultiplexFeatureFinderCentroidedSpectraMergerNoiseFilterSGolayITRAQAnalyzerIDMapperIDConflictResolverMapAlignerPoseClusteringMapRTTransformerFeatureLinkerUnlabeledQTProteinQuantifier

tools for file handling

FileConverterFileMergerFileFilterIDFileConverterIDMergerIDFilterMzTabExporterFileInfo

OpenMS ⇨ collection of 180 software tools ≈ 30 tools sufficient for standard

workflows

Page 6: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

OpenMSOpenMS – an open-source framework for computational mass spectrometry

Portable: available on Windows, OSX, Linux

OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools

• > 180 Building blocks: One application for each analysis step

• Vendor independent: Uses PSI standard formats

Can be integrated in various workflow systems

• TOPPAS – TOPP Pipeline Assistant

• Galaxy

• KNIME

Page 7: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

KNIME and TOPPViewKNIME – KoNstanz Information MinEr

• Enable to build customized workflows by using OpenMS components.

TOPVIEW: An OpenMS Data Viewer.

• Based on standard files formats.• MS/MS information,

peptides/proteins, quantitative information.

Page 8: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

KNIME – Workflow SystemKNIME – KoNstanz Information MinEr

Industrial-strength general-purpose workflow systemConvenient and easy-to-use graphical user interfaceAvailable for Windows, OSX, Linux at http://KNIME.org

KNIME (CC BY-SA 4.0)

Workflows

Plots

Tables

Console

Nodes

Page 9: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Workflow Builder: Data Flow

KNIME-OpenMS workflows consist of distinct nodes that are assembled into workflowsEither tables or files are exchanged between nodes along the edges of the workflowConfiguration dialogs are used to set node parametersLoops, allow iterating sequentially over lists of dataSwitches, allow executing nodes or subworkflows dependent on a condition

Page 10: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

ScriptingKNIME permits the embedding of R code for advanced statisticsEmbedding of R scripts using the R Snippet nodeAll plotting capabilities of R can be used as well

Page 11: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Peptide/Protein IdentificationTask: Identify peptides in multiple samples

Mass spectra enter workflow on the leftLoop nodes permit execution of parts of the workflow Identified proteins end up in result files (right side)

Page 12: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

TOOView: Visualization of the resultsmzML idXML

Page 13: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Workflow – Plug-In SystemTask: Identify peptides in multiple samples

Mass spectra enter workflow on the leftLoop nodes permit execution of parts of the workflow Identified proteins end up in result files (right side)

Page 14: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Workflow – Plug-In SystemTask: Identify peptides in multiple samples

Combination of Xtandem+OMSSADefining of QC parameters like FDR. Q-values, P-values.

Page 15: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Complex and customized Workflows

X!Tandem Mascot MS-GF+ Merged

PIA 1214 64 (5.3%) 1442 74 (5.1%) 1631 93 (5.7%) 1615 101 (6.2%)

Fido 996 67 (6.7%) 1439 80 (5.6%) 1679 96 (5.7%) 1619 105 (6.5%)

ProteinLP 989 64 (6.5%) 1229 77 (2.3%) 1651 93 (5.6%) 1295 104 (8.0%)

MSBayesPro 749 24 (3.2%) 958 26 (2.7%) 1303 31 (2.4%) 963 36 (3.7%)

ProteinProphet 1027 64 (6.2%) 1282 73 (5.7%) 1629 91 (5.6%) 1629 99 (6.7%)

Audain E. & Uszkoreit J. et al, Journal of Proteomics, 2017

Best Protein inference algorithm:

3 Datasets4 Search engines.5 Protein inference algorithms. > 140 combinations.

Page 16: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Some of the Identification nodesIDPosteriorErrorProbability

Compute the posterior error probability for each PSMGenerate a new file with the corresponding values.

ConsensusIDCombine PSM identifications from multiple search engines. Generate a Combined PosteriorErrorProbability for each PSM. For each peptide ID, use the best score of any search engine as the consensus score.

FalseDiscoveryRateFor each peptide ID, use the best score of any search engine as the consensus score.

Page 17: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Adapters and Complementary NodesFileMerger

This nodes takes two files (or file lists) as input and outputs a merged list of both inputs. The order corresponds to the order of the input lists and ports.

IDMergerMerges several protein/peptide identification files into one file.

PeptideIndexerRefreshes the protein references for all peptide hits.

IDFilterFilters results from protein or peptide identification engines based on different criteria.

Page 18: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Quantitative Proteomics Quantitative Proteomics

Relative Quantification

Labeled

In vivo

14N/15N SILAC

In vitro

iTRAQ TMT 16O/18O

Label-Free

Spectral Counting MRM Feature-Based

Absolute Quantification

AQUA SISCAPA

And many more…

Page 19: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Label-Free Quantification (LFQ)Label-free quantification is probably the most natural way

of quantifying • No labeling required, removing further sources of

error, cheap• Different samples acquired in different measurements –

higher reproducibility needed• Manual analysis difficult• Scales very well with the number of samples, basically

no limit, no difference in the analysis between 2 or 100 samples

Page 20: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Feature-based LFQ - LC-MS MapsSpectra are acquired with rates up to dozens per second

Stacking the spectra yields peak mapsResolution: • Up to millions of points per spectrum• Tens of thousands of spectra per LC runHuge 2D datasets of up to hundreds of GB per sample

Quantification (3x over-expressed,

…)

Feature(eluting peptide)

Page 21: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Feature -based LFQ1. Find features in all maps2. Align maps 3. Link corresponding

features4. Identify features5. Quantify features6. Quantify proteins based

on their peptidesNPC2_HUMA

N1.0 : 5.2 : 0.3

CD177_HUMAN 1.0 : 0.2 : 0.4

::

Sample 1 Sample 2 Sample 3

Page 22: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Label-Free Workflow

Different algorithms has been proposed by the OpenMS community for label free:• Weisser H, Journal of Proteome Research (2013).• Bo Zhang, Molecular Cell Proteomics (2016). • Veit J., Jounral of Proteome Research (2016)• Ranninger C., Analytica Chimica Acta (2016)

Page 23: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

DeMix-Q Algorithm and Workflow

Bo Zhang, Lukas Käll & Roman A. Zubarev, MCP (2016)

Page 24: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

Reliable and reproducible Quantitation

Page 25: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

LFQ Relevant nodesFeatureFinderCentroid

Detects two-dimensional features in LC-MS data.

MapAlignerPoseClusteringCorrects retention time distortions between maps using a pose clustering approach.

FeatureLinkerUnlabeledQTGroups corresponding features from multiple maps.

ConsensusMapNormalizerNormalizes maps of one consensusXML file

Page 26: OpenMS: Quantitative proteomics at large scale

Proteomics BioinformaticsEMBL-EBI, December 2016

OpenMS at Large ScaleGalaxyWS-PGRADE/gUSEKNIME

Each individual tool can be run in the command line making possible its distribution in large HPC environments.

$> FileFilter -in myinfile.mzML -levels 2 -rt 100:1500 -out myoutfile.mzML

$> OpenSwathDecoyGenerator.exe −in OpenSWATH_SGS_AssayLibrary.TraML −out OpenSWATH_SGS_AssayLibrary_with_Decoys.TraML −method shuffle −append exclude_similar −remove_unannotated

Page 27: OpenMS: Quantitative proteomics at large scale

Conclusions

• OpenMS modular workflow system • standard workflows:

SILAC, iTRAQ/TMT, label-free, Swath, Quality Control

• strong collaboration with other projects:ProteoWizard, Thermo PD, Knime, FidoPercolator, search engines, HUPO-PSI formats

Page 28: OpenMS: Quantitative proteomics at large scale

How to run OpenMS workflows• OpenMS, local installation

(Windows, OS X, Linux)http://bit.ly/1J6lz6hhttp://openms.de/workflows

• OpenMS in Proteome Discoverer(LFQProfiler and RNPxl for PD 2.1)http://openms.de/PD

• OpenMS in Galaxyhttp://galaxy.uni-freiburg.de

• OpenMS in Knimehttps://tech.knime.org/community/bioinf/openms