embl-ebi proteomics data resources and services · ebi is an outstation of the european molecular...

Post on 04-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EBI is an Outstation of the European Molecular Biology Laboratory.

17.10.2010

EMBL-EBI Proteomics data resources and services

Rafael JIMENEZ (EBI, Hinxton, UK)

4th Annual Forum for SMEsMunich, October 18th-19th 2010

Context

Integration, standards and dissemination

UniprotProtein Sequences

ReactomePathways

IntActInteractions

PRIDEMass Spec

DASPSICQUIC

EnCore

Annotation

Archive

sequence databases

(INSDC)

EMBL

DDBJNCBI

interactions

IMEx

IntAct

BIND

DIP

MINT

mass spec

ProteomeXchange

PRIDE

PeptideAtlas

GPMDB

Tranche

Sharing infrastructures

• Multiple repositories in a particular field

Collaboration and data exchange

More data coverage

• Proteomics Standards Initiative

• Work group of the Human Proteome Organization

• Defines community standards for data in proteomics• … facilitating data comparison, exchange and verification

PSI

4

http://www.psidev.info/

• Proteomics Standards Initiative

• Work group of the Human Proteome Organization

• Defines community standards for data in proteomics• … facilitating data comparison, exchange and verification

PSI

5

• MIAPE: The Minimum Information About a Proteomics Experiment

• Data and metadata from proteomics experiments

• Data: results

• Metadata: data about the data

• Where the samples came from

• How the analysis were performed

• Proteomics Standards Initiative

• Work group of the Human Proteome Organization

• Defines community standards for data in proteomics• … facilitating data comparison, exchange and verification

PSI

6

http://www.psidev.info/

7

PSI-MI

Data format

Data distribution

Control vocabulary

Data submission

Website

Standard format

Tools

PSICQUIC

PSI-MI CV

Reporting guideline MIMIx

Tools

PSI-MI XML

PSI-MITAB

XML Java API

MITAB Java API

XMLMakerFlattener

XML Validator

MIF25_view.xsl

MIF25_compact.xsl

MIF25_expand.xsl

PSI-MI XML files

PSI Excel Sheet

PSI Web Form

Data

Servers

Registry

Clients

• Work group of the Proteomics Standards Initiative

• Community coordination effort to ensure deposition of

data in public repositories

• Concentrating on …

• Annotation and representation of published MI data

• Accessibility of MI data to the user community

PSI - Molecular Interactions

Data format

Data distribution

Control vocabularyMIAPE

Reporting guideline

PSI-MI XML

PSI-MITAB

PSICQUIC

MIMIxPSI-MI CV

http://www.psidev.info/MI

Scoring

PSISCORE

PSI-MI format

• Community standard for Molecular Interactions

• Jointly developed by major data providers: BIND,

CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono,

U. Bielefeld, U. Bordeaux, U. Cambridge, and others

• Collecting and combining data from different sources

has become easier

• Standardized annotation through PSI-MI ontologies

• Tools from different organizations can be chained, e.g.

IntAct data in Cytoscape.

9

psi-mi/xml25 psi-mi/tab25

PSI-MI Control vocabulary

• Ontology browser: http://www.ebi.ac.uk/ontology-lookup

MIMIx

• MIAPE document guideline for molecular interactions• 1. Manuscript information

• 2. Experiment

• 3. Interaction

• 4. Confidence

Data distribution: PSICQUIC

• Proteomics Standards Initiative Common QUery InterfaCe.

• Community effort to standardise the way to access and retrieve data

from Molecular Interaction databases.

• Widely implemented by independent interaction data resources.

• Based on the PSI standard formats (PSI-MI XML and MITAB)

• Not limited to protein-protein interactions, also e.g.

• Drug-target interactions

• Simplified pathway data

• A registry listing resources implementing PSICQUIC

• Documentation: http://psicquic.googlecode.com

PSICQUIC implementation

….….

….....

….….

….....

PSICQUIC PSICQUIC PSICQUIC

Sample

Observation error

Interaction databases

Publications

PSICQUIC services

Annotation error

User

PSICQUIC

Registry

PSICQUIC client

PSICQUIC

Registry

• 13 sources

• 14.665.530

interactions

http://www.ebi.ac.uk/Tools/webservices/psicquic/registry/registry?action=STATUS

PSICQUIC example: REST queries

Bruno Aranda (baranda@ebi.ac.uk)

http://mint.bio.uniroma2.it/mint/psicquic/webservices/current/search/query/p53

http://www.ebi.ac.uk/Tools/webservices/psicquic/intact/webservices/current/search/query/p53

http://www.ebi.ac.uk/Tools/webservices/psicquic/chembl/webservices/current/search/interactor/p53

1

2

3

PSICQUIC example: MIQL

Bruno Aranda (baranda@ebi.ac.uk)

• Molecular Interaction Query Language

17

PSICQUIC client

18

PSICQUIC clustering

19

PSISCORE

20

20

PSISCORE

Scoring algorithm

description, provided

by scoring server /

registry

Examplary visualization

of a scoring algorithm

with a 0-1 range

Scoring algorithms

offered by PSISCORE

servers

IMEx website http://www.imexconsortium.org/

IMEx: The International Molecular Exchange Consortium

• Group of major public interaction data providers sharing curation effort: DIP, IntAct, MINT, MPact, MatrixDB, MPIDB and BioGRID

• Independent molecular interaction resources

• Common curation standards for detailed curation

• Common data formats (PSI-MI XML, PSI-MITAB, PSICQUIC)

• Common accession number space

• Coordinated & non-redundant curation

• In production mode since February 2010

• Since 3/2009 supported by the European Commission under PSIMEx, contract number FP7-HEALTH-2007-223411, with additional partners Vital-IT, Nature,

Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai)

Imex.sf.net

IntAct

• Freely available, open-source database system

• Public repository of molecular interactions

• Interactions manually curated and reviewed by experts

• Interaction derived from literature or direct user submissions

• Topic centric datasets (eg. Cancer, Chromatin, MSD…)

• Analysis tools for interaction data

• EBI database (part of the IMEx consortium and the PSI-MI)

• Data updated every week: ftp://ftp.ebi.ac.uk/pub/databases/intact

• Data formats available:

http://www.ebi.ac.uk/intact

IntAct statistics

IntAct statistics

• Interactions by identification method

• ~70% Y2H

• ~25% Affinity purification

• ~3% Physical data

• ~2% Other methods

IntAct statistics

IntAct: Search and results

Export

Custom columns

Filters

More results(PSICQUIC)

IntAct

29

PSI-MSS PSI-MS

PSI-PI

Data format

Tools

Standard format

Reporting guideline MIAPE-MS

mzMLTraML

- ProDaC

- OpenMS/TOPP

- ProteoWizard

- Proteios

- TPP

- X!Tandem

- Myrimatch

- InSilicoSpectro

- NCBI C++ toolkit

- Mascot

Validation, analysis, exporters, viewers , ...

- Phenyx

- PEAKS

- mzML_Exporter

- CompassXport

- Insilicos Viewer

-Jmzml

- Pride Inspector

- Pride Converter

Control vocabulary PSI-MS

Data format

Tools

Standard format

Reporting guideline MIAPE-MSI

mzIdentMLmzQuantML

- mzIdentML validator

- Mascot

- OMSSA

- Peaks

- Phenyx

- PLGS

- ProCon

- ProteinPilot

- ProteinScape

- SEQUEST

Validation, analysis, exporters, viewers , ...

- SpectraST

- Spectrum Mill

- X!Tandem

- OpenMS/TOPP

- Scaffold

- TPP

- Mascot Integra

- MIAPE MSI exporter

- CSV exporter

Tools

Data

WebsitePride Inspector

Pride Converter

Pride Biomart

Pride QProjects

PICR

OLS

• Work group of the Proteomics Standards Initiative

• Community coordination to ensure deposition of data in

public repositories

• Concentrating on …

• Annotation and representation of published MS data

• Accessibility of MS data to the user community

PSI - Mass Spectrometry Standards

Individual

proteins

Peptides

Protein

mixture

Peptide

Mass

Separation 2D-SDS-PAGE

Spot Cutting

Digestion

Trypsin

Mass Spectroscopy MALDI-TOF

Database

search

mzML

mzIdentML

Protein

identification

Quantification

mzQuantML

Protein

quantification

mzXML

mzData

analysisXML

PSI-MS Controlled vocabulary

31

• Share by PSI-MSS and PSI-PI

• Ontology browser: http://www.ebi.ac.uk/ontology-lookup

MIAPE

PSI-MS PSI-PI

ProteomExchange website

33

http://www.proteomeexchange.com

ProteomExchange:Enhancing Cooperation of Proteomics Data Repositories

• Group of major public Mass Spec data providers

• Single point of submission to proteomics repositories

• Encourage data exchange

• Common data formats (mzML, mzIdentML, mzQuantML)

• Common accession number space

• Coordinated & non-redundant data

• Since 2010 supported by the European Commission

35

Secondary resources

Data reprocessing and notification

Journals

WileyProteomics

NBT

JPR

MCP

Standards Local data management systems

mzQuantML

Release 1 Release 2 Release 3

ProHITS

MS-Lims

ProCon

Phenyx

OmicsHub

Other

LIMS

Pride Converter

Repositories

PrideMetadata,

Results

mzML

mzIdentML

Peptide

AtlasUniprot

NISTSpectrum

libraries

……

Imple

mente

d in

Data submission

RSS

feed

Central

Dataset

Look-up

Service

MIAPE

validation

Accession

Number/

reviewer login

Notification

Reprocessing notification

TrancheRaw

data

Peptidome

Metadata,

Results

xref xref

Data release / publications

Proposal structure

http://www.ebi.ac.uk/pride

The Proteomics Identifications Database

• Centralized, standards compliant, public data repository for proteomics identifications

• Open source

• Open data

• > 100.000.000 spectra

• ~ 4.000.000 protein identifications

• Detailed annotation of meta-data

• Vizcaíno JA, Côté R, Reisinger F, Foster JM, Mueller M, Rameseder J, Hermjakob H, Martens L.A guide to the Proteomics Identifications Database proteomics data repository.Proteomics. 2009 Sep;9(18):4276-83.PMID: 19662629

PRIDE data content

37

Release of PRIDE Converter

Protein IDs Peptide IDs

PRIDE data content

PRIDE Website

PART_OF

Search by

• Experiment

• Protein id

• Ontology

PRIDE Website

• Results

• Peptide IDs

• Protein IDs

• Mass spectra as peak lists

• Metadata - experiment

• Analysis

17.10.201041

BioMart – System Overview

ATGCTGTTGTGCATGCTGGACTGGATGGCCCGATGGATGCTGTTGTGCATGCTGGACTGGATGGCCCGATGG

Source data(MySQL, Oracle, Postgres)

DB

Mart

Bert Overduin

42

PRIDE Biomart

1. Filter 2. Attributes

3. Results

http://www.ebi.ac.uk/pride/prideMart.do

http://www.ebi.ac.uk/pride/biomart/martservice?query= XML

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE Query>

<Query virtualSchemaName = "default" formatter = "TSV" header = "0"

uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >

<Dataset name = "pride" interface = "default" >

<Filter name = "experiment_ac" value = "1632"/>

<Attribute name = "submitted_accession" />

</Dataset>

</Query>

Easy programmatic access!

17.10.201043

Ontology Lookup Service

Web services!

• REST

• SOAP

• A unified, single point of query for over 69 ontologies

(updated daily) and upwards of 850,000 terms.

http://www.ebi.ac.uk/ontology-lookup/

Protein Identifier Cross-Reference Service

Logical xref

(hyperlinked)Inactive xref

Secondary

Identifier

Active xref

(hyperlinked)Richard Cote

• Common protein identifier space

• Aliases/synonyms for an identifier

• Maps secondary IDs to recent primary IDs

Web services!

• REST

• SOAP

http://www.ebi.ac.uk/Tools/picr/

PRIDE Converter• Wizard-like graphical user interface

• Data formats into valid PRIDE XML

• Efficient access to the OLS

• FTP submissions

Pride inspector• mzML and PRIDE XML files

• Browse locally PRIDE database

• Facilitate publication reviews

47 74 Protein DAS sources!

PRIDE

DAS 1.6

DAS & Dasty3Uniform access to multiple

repositories of biological

data distributed in different

geographical locations.

• New resource of High-quality data

• Determine which data from PRIDE is good

• Support evidence for protein existence in UniProt

Data exports:•Links, DAS track for all PRIDE data

•Quality controlled, e.g. “Protein Existence”, Expression Atlas from PRIDE-Q

PRIDE-Q *

Curation

Automated rules,

Curator override

PRIDE-Q

•Human pathway knowledgebase

•Manually curated

•Open source, open data

•Collaboration between EBI, OCRI and NYU

•Online since 2003•Matthews L, et al: Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2008 Nov 3.

http://www.ebi.ac.uk/pride

Reactome

50

Stats

http://reactome.oicr.on.ca

Sid

eb

ar

Main

text

Navigation bar

New site! Coming soon …

Pathway description

authors

summary

speciesGO term

other species

molecules

UniProtEnsembl

MIMKEGG

ChEBICompound

Entrez Gene

HapmapUCSC

RefSeq

PubChem

The Pathway BrowserSpecies selector

Search &

Analyze barSidebar

Pathway Diagram Panel

Details Panel (hidden)

Zoom/move

toolbar

Thumbnail

Pathway

Reaction

Black-box

Pathway Analysis – Overrepresentation

„Top-level‟

Reveal next level

P-val, In set/In pathway

Species Comparison II

Yellow = human/rat

Blue = human only

Grey = not relevant

Black = Complex

Expression Analysis II „Hot‟ = high

„Cold‟ = low

Molecular Interaction Overlay

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE Query>

<Query virtualSchemaName = "default" formatter = "TSV" header = "0"

uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >

<Dataset name = "pathway" interface = "default" >

<Filter name = "referencepeptidesequence_uniprot_id_list"

value = "P25205"/>

<Attribute name = "stableidentifier_identifier" />

<Attribute name = "pathway_db_id" />

</Dataset>

</Query>

BioMart1. Filter

2. Attributes

3. Results

http://www.reactome.org:5555/biomart/martservice?query=XMLEasy programmatic access!

http://www.reactome.org:5555/biomart/martview

Adknoledgments …

• EU:• ProDaC (to 03/2009)

• ProteomeBinders

• BioSapiens

• Felics

• LipidomicNet

• APO-SYS

• PSIMEx (since 03/2009)

• EMBL

• Wellcome Trust

• NIH

The Funding

60

Lab B

Private Data in

PRIDE “Collaboration”

Comparison

Reviewer

Lab A

Lab C

PRIDE private mode

Publicly available data

•Private mode allows data

analysis within a

collaboration

•PRIDE tools are already

accessible in private mode, in

particular experiment

comparison (alpha)

•On manuscript submission,

reviewers can access the data

in standard format

Lab B

Private Data

“Collaboration”

Reviewer

Lab A

Lab C

PRIDE private mode

Publicly available data

•Private mode allows data

analysis within a

collaboration

•PRIDE tools are already

accessible in private mode, in

particular experiment

comparison (alpha)

•On manuscript submission,

reviewers can access the data

in standard format

•On manuscript publication,

the data becomes public

top related