embl-ebi now and in the future · proteomexchange consortium •goal: development of a framework to...

38

Upload: others

Post on 21-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing
Page 2: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Page 3: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Page 4: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

02/06/2014 4

Page 5: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Compton & Kelleher, Nat. Methods, 2012

Page 6: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

MS/MS matching identifies

peptides, not proteins.

Proteins are inferred from the

peptide sequences.

Page 7: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Proteomics data is potentially very complex and its interpretation is

often troublesome and/or controversial.

• In other ‘omics’ fields, data sharing ‘culture’ is well established.

Generally, it is considered to be a good scientific practise.

• In proteomics, the ‘culture’ is evolving in that direction.

• Public availability of data enables:

• reinterpretation

• validation of the experimental results reported.

• reuse of the data (e.g. for meta-analysis studies).

• Specific use cases for proteomics: spectral libraries, fragmentation

models, SRM transitions,…

Page 8: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Olsen & Mann, Science Sign, 2011

Page 9: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Main public MS-based proteomics repositories: - PROteomics IDEntifications database (PRIDE, EBI)

- Global Proteome Machine (GPMDB)

- PeptideAtlas (ISB, Seattle)

• Many others, more specialized: Among others: Human Proteinpedia, Genome Annotation Proteomics Pipeline

(GAPP),…

• New in 2013-2014: ProteomicsDB, CHORUS, massIVE, iProx.

• Very diverse: different aims, functionalities,… but also complementary.

• Main focus is MS/MS data.

Page 10: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected] 02/06/2014 10

• Many different workflows need to be supported. They provide

complementary ‘views’.

• No data reprocessing. Data is stored as ‘published’ or

originally analysed:

• PRIDE (MS/MS data)

• PASSEL (SRM data)

• Data reprocessing (MS/MS data):

• PeptideAtlas

• GPMDB

Page 11: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Resources that try to represent the authors’ analysis view on the

data.

• Various workflows are allowed and they can provide

complementary results.

• Data are not ‘updated’ in time. However, meta-analysis on top is

possible.

• Accumulation of FDRs when datasets are combined.

• Main representatives: PRIDE (MS/MS data) and

PeptideAtlas/PASSEL (SRM data).

• Data standards are essential.

Page 12: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

http://www.ebi.ac.uk/pride

• Focused on MS/MS

approaches

Martens et al., Proteomics, 2005

Vizcaíno et al., NAR, 2013

Page 13: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• These resources collect MS raw data and reprocess it using

one given analysis pipeline, and an up to date protein

sequence database.

• Advantage: They provide a ‘standardized’ and updated view

on the experimental data available.

• Only one common analysis method is used and there can be

information loss.

• Different from the author’s view on the data.

• Main resources: GPMDB and PeptideAtlas (ISB, Seattle).

Page 14: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

ProteomeXchange Consortium

• Goal: Development of a framework to allow

standard data submission and dissemination

pipelines between the main existing proteomics

repositories.

• Includes PeptideAtlas (ISB, Seattle) and

PRIDE.

• ProteomeXchange is primarily user-oriented:

the idea is to make things easier for the users

(submission and access to the data).

• Two supported data workflows: MS/MS and

SRM data.

http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014

Page 15: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Reprocessed

Results

Researcher’s

Results

Proteome Central

Metadata /

Manuscript

Raw Data*

Results

Journals

UniProt/

NeXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL

(SRM data)

PRIDE

(MS/MS data)

Other DBs

GPMDB

Taken from: Vizcaíno et al., Nat Biotechnol, 2014

Page 18: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Re-analysis of data at a later time can potentially achieve more

comprehensive results and get new biological knowledge:

• Improved analysis software.

• Better reference protein sequence databases (which are

always evolving).

• Identification of new post-translational modifications.

Page 19: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Individual authors can reprocess raw data with new

hypotheses in mind (not taken into account by the original

authors).

• Recent examples (using phosphoproteomics data sets):

• O-GlcNAc-6-phosphate1

• Phosphoglyceryl2

• ADP-ribosylation3

1Hahne & Kuster, Mol Cell Proteomics (2012) 11 10 1063-9 2Moellering & Cravatt, Science (2013) 341 549-553

3Matic et al., Nat Methods (2012) 9 771-2

Page 20: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Reprocessing of MS raw data with this idea in mind.

• Falls into the broad “ProteoGenomics” field. Many papers

have been published in this topic.

• Validation of existing genes.

• New splice isoforms, pseudogenes, etc.

Page 21: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Brosch et al. (2011) Genome Res 21:756-767

• In this particular paper:

• 53 genes alternatively transcribed

• 10 new protein coding genes

• Pipeline to integrate gene annotations in the mouse

genome.

Page 22: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Public availability of data enables:

• reinterpretation.

• validation of the experimental results reported.

• Specific use cases for proteomics: spectral libraries,

fragmentation models, SRM transitions,…

Page 23: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Analysis of Tyrannosaurus rex fossils: controversial presence of

collagen (is it a contamination of the sample?)

Asara et al. (2007) Science 316: 280-5.

Asara et al. (2007) Science 316: 1324-5.

Bern et al. (2009) JPR 9: 4328-32

Page 24: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Info from R. Chalkley

Bromenshenk et al. (2011) PLOS One 5: e13181

Page 25: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Experimental Protocol

1. Collected samples from healthy, collapsing and collapsed bee colonies.

2. Homogenised bees.

3. Digested with Trypsin

4. Analyzed by LC-MSMS on LTQ

5. Searched using Sequest

6. Filtered Results using Peptide and Protein Prophet

7. Performed further analysis to determine species statistically more

commonly found in collapsing/collapsed colony samples Info from R. Chalkley

Bromenshenk et al. (2011) PLOS One 5: e13181

Page 26: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

• Big pitfall: Search protein database was only composed of

viral proteins. No bee honey proteins at all!!

• After researching the data, there is no evidence for viral

peptides/proteins in any of their data: honey bee, fruit fly,

wasp, moth, human keratin, bacteria that like sugary

environments, …

• “We believe that there is currently insufficient evidence to

conclude that bees are a natural host for IIV-6, let alone that

the virus is linked to CCD”.

Info from R. Chalkley

Knudsen & Chalkley (2011) PLOS One 6:

e20873

Foster (2011), MCP 10: M110.006387

Page 27: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Protein Databank in Europe (PDBe) group

www.ebi.ac.uk/pdbe

• Is one of the four sites around the world that where 3D structures may be deposited.

• Provides stable and clean repository of macromolecular structure data.

• Has services that allow users to access, search and retrieve structural data from a single web access point.

Page 28: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Page 29: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Page 30: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

PDBePisa

What assembly can my structure have ?

Page 31: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

What binds ASP ASP HIS LYS ?

PDBeMotif

Page 32: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Protein of

unknown function

metallopeptidase

carbohydrate biosynthesis

Page 33: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

36

• Proteins are the workhorses of cell, enzymes, structural proteins, signal transduction, transport, transcription, translation and degradation, traversing membranes … all as a functional/regulatory network.

• By mapping these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation

• One way to predict protein function is through identification of binding partners – Guilt by Association

• If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s)

Page 34: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Page 35: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

38

Page 36: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Page 37: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]

Page 38: EMBL-EBI Now and in the Future · ProteomeXchange Consortium •Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing

Sandra Orchard [email protected]