embl-ebi now and in the future · proteomexchange consortium •goal: development of a framework to...
TRANSCRIPT
Sandra Orchard [email protected]
Sandra Orchard [email protected]
Compton & Kelleher, Nat. Methods, 2012
Sandra Orchard [email protected]
MS/MS matching identifies
peptides, not proteins.
Proteins are inferred from the
peptide sequences.
Sandra Orchard [email protected]
• Proteomics data is potentially very complex and its interpretation is
often troublesome and/or controversial.
• In other ‘omics’ fields, data sharing ‘culture’ is well established.
Generally, it is considered to be a good scientific practise.
• In proteomics, the ‘culture’ is evolving in that direction.
• Public availability of data enables:
• reinterpretation
• validation of the experimental results reported.
• reuse of the data (e.g. for meta-analysis studies).
• Specific use cases for proteomics: spectral libraries, fragmentation
models, SRM transitions,…
Sandra Orchard [email protected]
Olsen & Mann, Science Sign, 2011
Sandra Orchard [email protected]
• Main public MS-based proteomics repositories: - PROteomics IDEntifications database (PRIDE, EBI)
- Global Proteome Machine (GPMDB)
- PeptideAtlas (ISB, Seattle)
• Many others, more specialized: Among others: Human Proteinpedia, Genome Annotation Proteomics Pipeline
(GAPP),…
• New in 2013-2014: ProteomicsDB, CHORUS, massIVE, iProx.
• Very diverse: different aims, functionalities,… but also complementary.
• Main focus is MS/MS data.
Sandra Orchard [email protected] 02/06/2014 10
• Many different workflows need to be supported. They provide
complementary ‘views’.
• No data reprocessing. Data is stored as ‘published’ or
originally analysed:
• PRIDE (MS/MS data)
• PASSEL (SRM data)
• Data reprocessing (MS/MS data):
• PeptideAtlas
• GPMDB
Sandra Orchard [email protected]
• Resources that try to represent the authors’ analysis view on the
data.
• Various workflows are allowed and they can provide
complementary results.
• Data are not ‘updated’ in time. However, meta-analysis on top is
possible.
• Accumulation of FDRs when datasets are combined.
• Main representatives: PRIDE (MS/MS data) and
PeptideAtlas/PASSEL (SRM data).
• Data standards are essential.
Sandra Orchard [email protected]
http://www.ebi.ac.uk/pride
• Focused on MS/MS
approaches
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2013
Sandra Orchard [email protected]
• These resources collect MS raw data and reprocess it using
one given analysis pipeline, and an up to date protein
sequence database.
• Advantage: They provide a ‘standardized’ and updated view
on the experimental data available.
• Only one common analysis method is used and there can be
information loss.
• Different from the author’s view on the data.
• Main resources: GPMDB and PeptideAtlas (ISB, Seattle).
Sandra Orchard [email protected]
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle) and
PRIDE.
• ProteomeXchange is primarily user-oriented:
the idea is to make things easier for the users
(submission and access to the data).
• Two supported data workflows: MS/MS and
SRM data.
http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014
Sandra Orchard [email protected]
Reprocessed
Results
Researcher’s
Results
Proteome Central
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Taken from: Vizcaíno et al., Nat Biotechnol, 2014
Sandra Orchard [email protected]
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Sandra Orchard [email protected]
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Sandra Orchard [email protected]
• Re-analysis of data at a later time can potentially achieve more
comprehensive results and get new biological knowledge:
• Improved analysis software.
• Better reference protein sequence databases (which are
always evolving).
• Identification of new post-translational modifications.
Sandra Orchard [email protected]
• Individual authors can reprocess raw data with new
hypotheses in mind (not taken into account by the original
authors).
• Recent examples (using phosphoproteomics data sets):
• O-GlcNAc-6-phosphate1
• Phosphoglyceryl2
• ADP-ribosylation3
1Hahne & Kuster, Mol Cell Proteomics (2012) 11 10 1063-9 2Moellering & Cravatt, Science (2013) 341 549-553
3Matic et al., Nat Methods (2012) 9 771-2
Sandra Orchard [email protected]
• Reprocessing of MS raw data with this idea in mind.
• Falls into the broad “ProteoGenomics” field. Many papers
have been published in this topic.
• Validation of existing genes.
• New splice isoforms, pseudogenes, etc.
Sandra Orchard [email protected]
Brosch et al. (2011) Genome Res 21:756-767
• In this particular paper:
• 53 genes alternatively transcribed
• 10 new protein coding genes
• Pipeline to integrate gene annotations in the mouse
genome.
Sandra Orchard [email protected]
• Public availability of data enables:
• reinterpretation.
• validation of the experimental results reported.
• Specific use cases for proteomics: spectral libraries,
fragmentation models, SRM transitions,…
Sandra Orchard [email protected]
• Analysis of Tyrannosaurus rex fossils: controversial presence of
collagen (is it a contamination of the sample?)
Asara et al. (2007) Science 316: 280-5.
Asara et al. (2007) Science 316: 1324-5.
Bern et al. (2009) JPR 9: 4328-32
Sandra Orchard [email protected]
Experimental Protocol
1. Collected samples from healthy, collapsing and collapsed bee colonies.
2. Homogenised bees.
3. Digested with Trypsin
4. Analyzed by LC-MSMS on LTQ
5. Searched using Sequest
6. Filtered Results using Peptide and Protein Prophet
7. Performed further analysis to determine species statistically more
commonly found in collapsing/collapsed colony samples Info from R. Chalkley
Bromenshenk et al. (2011) PLOS One 5: e13181
Sandra Orchard [email protected]
• Big pitfall: Search protein database was only composed of
viral proteins. No bee honey proteins at all!!
• After researching the data, there is no evidence for viral
peptides/proteins in any of their data: honey bee, fruit fly,
wasp, moth, human keratin, bacteria that like sugary
environments, …
• “We believe that there is currently insufficient evidence to
conclude that bees are a natural host for IIV-6, let alone that
the virus is linked to CCD”.
Info from R. Chalkley
Knudsen & Chalkley (2011) PLOS One 6:
e20873
Foster (2011), MCP 10: M110.006387
Sandra Orchard [email protected]
Protein Databank in Europe (PDBe) group
www.ebi.ac.uk/pdbe
• Is one of the four sites around the world that where 3D structures may be deposited.
• Provides stable and clean repository of macromolecular structure data.
• Has services that allow users to access, search and retrieve structural data from a single web access point.
Sandra Orchard [email protected]
Sandra Orchard [email protected]
Protein of
unknown function
metallopeptidase
carbohydrate biosynthesis
Sandra Orchard [email protected]
36
• Proteins are the workhorses of cell, enzymes, structural proteins, signal transduction, transport, transcription, translation and degradation, traversing membranes … all as a functional/regulatory network.
• By mapping these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation
• One way to predict protein function is through identification of binding partners – Guilt by Association
• If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s)
Sandra Orchard [email protected]
Sandra Orchard [email protected]
Sandra Orchard [email protected]