experiences to learn from the ms proteomics field

Post on 06-Apr-2017

175 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Experiences to learn from the mass spectrometry proteomics field

Dr. Juan Antonio Vizcaíno

Proteomics Team LeaderEMBL-EBIHinxton, Cambridge, UK

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

•Develops data format standards for proteomics.•Both data representation and annotation standards.•Involves data producers, database providers, software producers, publishers, …•Active Workgroups: MI, MS, PI and now a new QC group.•Inter-group activities: MIAPE and Controlled Vocabularies.•Started in 2002, so some experience already…•One annual meeting in March-April, regular phone calls.•Peer Review for standards: PSI document process.

http://www.psidev.info

HUPO Proteomics Standards Initiative

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Current PSI Proteomics Standard File Formats for Mass Spectrometry

• mzMLMS data

• mzIdentMLIdentification

• mzQuantMLQuantitation

• mzTabFinal Results

• TraMLSRM

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

• mzML is actively used already to store MS data (very flexible format).

• mzTab is a tab-delimited format that it is being extended to support MS metabolomics data in a better way. It can be used for both identification and quantification results.

• mzQuantML and TraML could be used with small molecule data, but it has not been tested.

Reuse of data standards in metabolomics

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

• mzML is actively used already to store MS data (very flexible format).

• mzTab is a tab-delimited format that it is being extended to support MS metabolomics data in a better way. It can be used for both identification and quantification results.

• Meeting next week in Liverpool organised by A. Jones.

• mzQuantML and TraML could be used with small molecule data, but it has not been tested.

Reuse of data standards in metabolomics

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Current Standard File Formats that are or could be used in metabolomics

• mzMLMS data

• mzIdentMLIdentification

• mzQuantML *Quantitation

• mzTabFinal Results

• TraML * SRM

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Current vision for data exchange standards in MS

Neumann (IPB-Halle), Proteomics and HUPO-PSI community

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

imzML: data standard for mass imaging data

http://www.imzml.org

Not a PSI format: Based on mzML

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

qcML files to be generated after submission

• XML format that captures output from QC pipelines

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

• Don’t reinvent the wheel! There is no need…

• Software libraries (APIs) to handle the standards.

• Data converters.

• Data visualisation tools.

• Data analysis tools and workflows.

• A big proportion of the available software is open source.

Opportunity to reuse and extend existing software

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

mzML: more software available

The most popular search engines support mzML

Many parser libraries available

Conversion from raw files into mzMLhttp://www.psidev.info/mzml_1_0_0

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Data visualisation: PRIDE Inspector Toolsuite

Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., MCP, 2016

PRIDE Inspector Toolsuite

PRIDE Inspector Toolsuite supports:

- PRIDE XML- mzIdentML - mzML & all types of spectra files- mzTab identification and Quantification

https://github.com/PRIDE-Toolsuite/

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

OpenMS/TOPP• OpenMS – an open-source C++ framework for computational

mass spectrometry• Jointly developed at ETH Zürich, FU Berlin, University of Tübingen• Open source: BSD 3-clause license• Portable: available on Windows, OSX, and Linux• TOPP – The OpenMS Proteomics Pipeline

• Building blocks: one application for each analysis step• All applications share identical user interfaces• Uses PSI standard formats and integrates seamlessly with other applications

supporting these formats• Can be integrated in various workflow systems

• TOPPAS – TOPP Pipeline Assistant• Galaxy• WS-PGRADE• KNIME

Kohlbacher et al., Bioinformatics (2007), 23:e191

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

ProteomeXchange Consortium•Goal: Development of a framework to allow

standard data submission and dissemination pipelines between the main existing proteomics repositories.

•Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK), MassIVE (UCSD, San Diego) and jPOST (Japan) will be integrated in July 2016.

•EU FP7 CA (01/2011-> 06/2014).

•Common identifier space (PXD identifiers)

•Two supported data workflows: MS/MS and SRM.

•Main objective: Make life easier for researchers

http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

PRIDE Archive submitted datasets up until 1st April, 2016

• In the last complete year: on average, >150 submitted datasets per month

• Size of PRIDE Archive: ~ 220TB

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Vendor support for mzIdentML has grown in parallel with the number of submitted datasets

Search Engine

Results + MS files

Search engines

mzIdentML

- Mascot- MSGF+- Myrimatch and related tools from D. Tabb’s

lab- OpenMS- PEAKS- PeptideShaker (several open source tools)- ProCon (ProteomeDiscoverer, Sequest)- Scaffold- TPP via the idConvert tool (ProteoWizard)- ProteinPilot (from version 5.0)- X!Tandem (from PILEDRIVER version)- Others: library for X!Tandem conversion, lab

internal pipelines, …- Crux

An increasing number of tools support export to mzIdentML 1.1

Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

•Develop tools in parallel with the data standards.

•Don’t reinvent the wheel! Many ideas and software already there.

•Ideally, get vendors involved as soon as possible.

•Data repositories and data standards are a perfect match.

Conclusions

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Acknowlegements and further reading…

http://www.psidev.info

Poster P18

Juan A. Vizcaínojuan@ebi.ac.uk

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Questions?

top related