the mztab data standard format for reporting ms-based peptide, protein and small molecule...
DESCRIPTION
This is the talk I gave in HUPO 2014 on behalf of Johannes Griss about the mzTab data standard format.TRANSCRIPT
mzTab - Reporting MS-based Proteomics and Metabolomics Results
Dr. Johannes Griss
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Division of Immunology, Allergy and Infectious Diseases
Department of Dermatology
Medical University of Vienna, Austria
Dr. Juan A. Vizcaíno on behalf of
Johannes [email protected]
HUPO 2014
Overview
• Need for mzTab
• Details about the data format (mzTab 1.0)
• Existing software implementations
• Extension of mzTab 1.0 for metabolomics
Johannes [email protected]
HUPO 2014
•Develops data format standards for proteomics.
•Both data representation and annotation standards.
•Involves data producers, database providers, software producers, publishers, …
•Active Workgroups: MI, MS, PI, Mod, (Protein Separation).
•Inter-group activities: MIAPE and Controlled Vocabularies.
•Started in 2002, so some experience already…
www.psidev.info
HUPO Proteomics Standards Initiative
Johannes [email protected]
HUPO 2014
PSI-MS/PI Standard File Formats before mzTab
• TraMLSRM
• mzQuantMLQuantitation
• mzIdentMLIdentification
• mzMLMS data
Johannes [email protected]
HUPO 2014
Reasons for an additional file format (mzTab)• mzIdentML and mzQuantML (necessary) focus on
complete representation of proteomics results
• Complex XML-based file formats
• Specialised software required for visualisation
• In-depth bioinformatics understanding required to create and use files
• No simple method to communicate final results to non-proteomics experts
• No simple method to utilise files through scripting languages and standard statistical software
Johannes [email protected]
HUPO 2014
mzTab – Aims
• Store final results of MS-based experiment in a single file
• Quantitation data
• Identification data
• Small Molecule data
• Reduce complexity to make data accessible to non-proteomics / bioinformatics experts
• Be easily accessible using “standard” software
Johannes [email protected]
HUPO 2014
mzTab – Aims
• What the format does NOT aim at:
• Replace mzIdentML or mzQuantML for proteomics approaches
• Contain the complete data of a MS based experiment
• Provide fully detailed evidence for the data
• Allow a researcher to recreate the process which led to the results
Johannes [email protected]
HUPO 2014
Why a tab-delimited file?
• Using XML based formats requires sophisticated bioinformatics expertise
• Many researchers are still used to use MS Excel to “look” at or exchange their data.
• Standard tab-delimited file formats for transcriptomics (MAGE-TAB) and molecular interactions (MI-TAB) data were already successful
Johannes [email protected]
HUPO 2014
mzTab - Sections
• Basic information about experiment and sample• Key-Value pairsMetadata
• Basic information about protein identifications• Table-basedProtein
• Information about quantified peptides• Table-basedPeptide
• Information about identified spectra• Table-basedPSM
• Basic information about identified small molecules• Table-basedSmall Molecule
Johannes [email protected]
HUPO 2014
mzTab –Modes and Types
• Modes (depending on the level of detail):
• ‘Summary’: only the ‘final results’.
• ‘Complete’: detailed information for each individual assay or replicate is provided.
• Types:
• ‘Identification’: Only identification results.
• ‘Quantification’: They can also contain identification results.
• Overall, 4 different files “flavors” are possible, so very flexible design.
Johannes [email protected]
HUPO 2014
Peptide Section (label-free)
• Only used in “Quantification” files.
Johannes [email protected]
HUPO 2014
mzTab – Current implementations
• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript published in the journal Proteomics.
• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).
• mzIdentML and mzQuantML to mzTab converters (Andy Jones group).
• MaxQuant: exporter in beta is available.
• OpenMS (version 1.10).
• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).
• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).
• Metabolights (EBI).
Johannes [email protected]
HUPO 2014
mzTab – ongoing development
• More detailed modelling of MS metabolomics data
• Led by S. Neumann (COSMOS EU FP7 project).
• Extension from one to three sections.
Example file exists at
https://github.com/sneumann/mtbls2/faahKO.mzTab
http://www.cosmos-fp7.eu/
Johannes [email protected]
HUPO 2014
mzTab format related publications
http://code.google.com/p/mztab/
J. Griss et al., MCP, 2014
Q.W. Xu et al., Proteomics, 2014
Johannes [email protected]
HUPO 2014
Current PSI-MS/PI Standard File Formats
• mzTabFinal Results
• TraMLSRM
• mzQuantMLQuantitation
• mzIdentMLIdentification
• mzMLMS data
Johannes [email protected]
HUPO 2014
Acknowledgements
Johannes GrissQing-Wei XuHenning Hermjakob
Timo SachsenbergMathias WalzerOliver Kohlbacher
http://mztab.googlecode.com
Andy Jones
S. Neumann and other COSMOS partners
PSI editor and reviewers… and many others have also contributed
BBSRC PROCESS grantBBSRC ProteoSuite grant