embl-ebi now and in the future · proteomexchange consortium •goal: development of a framework to...

Sandra Orchard [email protected]

•

•

•

•


•

•

•

•

•

•

•

02/06/2014 4


Compton & Kelleher, Nat. Methods, 2012


MS/MS matching identifies

peptides, not proteins.

Proteins are inferred from the

peptide sequences.


• Proteomics data is potentially very complex and its interpretation is

often troublesome and/or controversial.

• In other ‘omics’ fields, data sharing ‘culture’ is well established.

Generally, it is considered to be a good scientific practise.

• In proteomics, the ‘culture’ is evolving in that direction.

• Public availability of data enables:

• reinterpretation

• validation of the experimental results reported.

• reuse of the data (e.g. for meta-analysis studies).

• Specific use cases for proteomics: spectral libraries, fragmentation

models, SRM transitions,…


Olsen & Mann, Science Sign, 2011


• Main public MS-based proteomics repositories: - PROteomics IDEntifications database (PRIDE, EBI)

- Global Proteome Machine (GPMDB)

- PeptideAtlas (ISB, Seattle)

• Many others, more specialized: Among others: Human Proteinpedia, Genome Annotation Proteomics Pipeline

(GAPP),…

• New in 2013-2014: ProteomicsDB, CHORUS, massIVE, iProx.

• Very diverse: different aims, functionalities,… but also complementary.

• Main focus is MS/MS data.

Sandra Orchard [email protected] 02/06/2014 10

• Many different workflows need to be supported. They provide

complementary ‘views’.

• No data reprocessing. Data is stored as ‘published’ or

originally analysed:

• PRIDE (MS/MS data)

• PASSEL (SRM data)

• Data reprocessing (MS/MS data):

• PeptideAtlas

• GPMDB


• Resources that try to represent the authors’ analysis view on the

data.

• Various workflows are allowed and they can provide

complementary results.

• Data are not ‘updated’ in time. However, meta-analysis on top is

possible.

• Accumulation of FDRs when datasets are combined.

• Main representatives: PRIDE (MS/MS data) and

PeptideAtlas/PASSEL (SRM data).

• Data standards are essential.


http://www.ebi.ac.uk/pride

• Focused on MS/MS

approaches

Martens et al., Proteomics, 2005

Vizcaíno et al., NAR, 2013





• These resources collect MS raw data and reprocess it using

one given analysis pipeline, and an up to date protein

sequence database.

• Advantage: They provide a ‘standardized’ and updated view

on the experimental data available.

• Only one common analysis method is used and there can be

information loss.

• Different from the author’s view on the data.

• Main resources: GPMDB and PeptideAtlas (ISB, Seattle).


ProteomeXchange Consortium

• Goal: Development of a framework to allow

standard data submission and dissemination

pipelines between the main existing proteomics

repositories.

• Includes PeptideAtlas (ISB, Seattle) and

PRIDE.

• ProteomeXchange is primarily user-oriented:

the idea is to make things easier for the users

(submission and access to the data).

• Two supported data workflows: MS/MS and

SRM data.

http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014

http://www.proteomexchange.org




Reprocessed

Results

Researcher’s

Results

Proteome Central

Metadata /

Manuscript

Raw Data*

Results

Journals

UniProt/

NeXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL

(SRM data)

PRIDE

(MS/MS data)

Other DBs

GPMDB

Taken from: Vizcaíno et al., Nat Biotechnol, 2014


http://proteomecentral.proteomexchange.org/cgi/GetDataset




• Re-analysis of data at a later time can potentially achieve more

comprehensive results and get new biological knowledge:

• Improved analysis software.

• Better reference protein sequence databases (which are

always evolving).

• Identification of new post-translational modifications.


• Individual authors can reprocess raw data with new

hypotheses in mind (not taken into account by the original

authors).

• Recent examples (using phosphoproteomics data sets):

• O-GlcNAc-6-phosphate1

• Phosphoglyceryl2

• ADP-ribosylation3

1Hahne & Kuster, Mol Cell Proteomics (2012) 11 10 1063-9 2Moellering & Cravatt, Science (2013) 341 549-553

3Matic et al., Nat Methods (2012) 9 771-2


• Reprocessing of MS raw data with this idea in mind.

• Falls into the broad “ProteoGenomics” field. Many papers

have been published in this topic.

• Validation of existing genes.

• New splice isoforms, pseudogenes, etc.


Brosch et al. (2011) Genome Res 21:756-767

• In this particular paper:

• 53 genes alternatively transcribed

• 10 new protein coding genes

• Pipeline to integrate gene annotations in the mouse

genome.


• Public availability of data enables:

• reinterpretation.

• validation of the experimental results reported.

• Specific use cases for proteomics: spectral libraries,

fragmentation models, SRM transitions,…


• Analysis of Tyrannosaurus rex fossils: controversial presence of

collagen (is it a contamination of the sample?)

Asara et al. (2007) Science 316: 280-5.

Asara et al. (2007) Science 316: 1324-5.

Bern et al. (2009) JPR 9: 4328-32


Info from R. Chalkley

Bromenshenk et al. (2011) PLOS One 5: e13181


Experimental Protocol

1. Collected samples from healthy, collapsing and collapsed bee colonies.

2. Homogenised bees.

3. Digested with Trypsin

4. Analyzed by LC-MSMS on LTQ

5. Searched using Sequest

6. Filtered Results using Peptide and Protein Prophet

7. Performed further analysis to determine species statistically more

commonly found in collapsing/collapsed colony samples Info from R. Chalkley

Bromenshenk et al. (2011) PLOS One 5: e13181


• Big pitfall: Search protein database was only composed of

viral proteins. No bee honey proteins at all!!

• After researching the data, there is no evidence for viral

peptides/proteins in any of their data: honey bee, fruit fly,

wasp, moth, human keratin, bacteria that like sugary

environments, …

• “We believe that there is currently insufficient evidence to

conclude that bees are a natural host for IIV-6, let alone that

the virus is linked to CCD”.

Info from R. Chalkley

Knudsen & Chalkley (2011) PLOS One 6:

e20873

Foster (2011), MCP 10: M110.006387


Protein Databank in Europe (PDBe) group

www.ebi.ac.uk/pdbe

• Is one of the four sites around the world that where 3D structures may be deposited.

• Provides stable and clean repository of macromolecular structure data.

• Has services that allow users to access, search and retrieve structural data from a single web access point.


•

•

•

•

•

•

•


PDBePisa

What assembly can my structure have ?


What binds ASP ASP HIS LYS ?

PDBeMotif


Protein of

unknown function

metallopeptidase

carbohydrate biosynthesis


36

• Proteins are the workhorses of cell, enzymes, structural proteins, signal transduction, transport, transcription, translation and degradation, traversing membranes … all as a functional/regulatory network.

• By mapping these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation

• One way to predict protein function is through identification of binding partners – Guilt by Association

• If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s)


38

•

•

•

•


•

•

•

embl-ebi now and in the future · proteomexchange consortium •goal: development of a framework to...

Documents