intact janna hastings and james watson ebi bioinformatics roadshow ilri, nairobi (2-3 march 2011)...

49
IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March A database of Molecular Interactions

Upload: bethanie-farmer

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

IntAct

Janna Hastings and James WatsonEBI Bioinformatics RoadshowILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011)

A database of Molecular Interactions

Page 2: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

What data are we dealing with ?

What are protein-protein interactions?

Page 3: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

What data are we dealing with ?

Example technique: yeast two hybrid

Page 4: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

What data are we dealing with ?

Why are we interested in Interactions ?

1. As a means of precisely understanding a protein role inside a specific cell type

2. Guilt by Association – it may be the only means of predicting a protein’s function

3. As building blocks for Systems Biology

Page 5: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

What data are we dealing with ?

The scope of IntAct data

Nucleic acids Proteins

Transcriptomics Small compounds

Page 6: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

1. Define a standard for the representation and annotation of molecular interaction data

2. provide a public repository

1. populate the repository with experimental data from project partners and curated literature data

2. provide modular analysis tools

3. provide portable versions of the software to allow installation of local IntAct nodes.

IntAct goals & achievements

http://www.ebi.ac.uk/intactftp://ftp.ebi.ac.uk/pub/databases/intact

4200+ distinct publications, 228,000+ binary interactions, 68,000+ proteins imported from UniProt

search & advanced search, hierarchView, pay-as-you-go, MiNe…

Known installation: AstraZeneca, GSK, MERCK, MINT, Proteome Center of Shanghai

Page 7: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Master headline

“Lifecycle of an Interaction”

Publication(full text)

Sanity Checks(nightly)

IntAct Curation

CVs

curator

report report

Curation manual

.abstract

reject

Super curator

annotate

p1

p2I

exp

IMEx

MatrixDB Mint DIP

Public web site

FTP siteaccept

chec

k

Page 8: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Public data

• All data is manually curated by expert curators

• Curation manual rigorously followed

• All curated data is reviewed by a senior curator

• All data is made available on FTP site:

(!) data updated every week

(!) format available:

ftp://ftp.ebi.ac.uk/pub/databases/intactData

Page 9: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Christian Kohler

Master headline

Interaction space

•Realistically one publication per working day and curator

•Only a fraction of all published interactions is captured in interaction databases

•An end is not in sight, the interaction space is still vastly under-sampled

Page 10: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

A very detailed data model

• Support for detailed featuresi.e. definition of interacting interface

Overlay of Ranges on sequence:

Interacting domains

Page 11: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Controlled vocabularies• Why do we use them ?

e.g. far too many ways to write: yeast two hybrid, Y2H, 2H, two-hybrid, …

• Full integration of PSI-MI ontology

• Over 1,500 terms, fully defined and cross-referenced

Page 12: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

How to deal with Complexes

• Some experimental protocol do generate complex data:

Eg. Tandem affinity purification (TAP)

• One may want to convert these complexes into sets of binary interactions, 2 algorithms are available:

Page 13: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Data distribution: PSICQUIC

• Proteomics Standards Initiative Common QUery InterfaCe.

• Community effort to standardise the way to access and retrieve data from Molecular Interaction databases.

• Widely implemented by independent interaction data resources.

• Based on the PSI standard formats (PSI-MI XML and MITAB)

• Not limited to protein-protein interactions, also e.g.• Drug-target interactions• Simplified pathway data

• A registry listing resources implementing PSICQUIC

• Documentation: http://psicquic.googlecode.com

Page 14: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

PSICQUIC: distributing data over multiple sources

Page 15: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

• Community standard for Molecular Interactions

• XML schema and detailed controlled vocabularies

• Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others

• Version 1.0 published in February 2004The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data.Henning Hermjakob et al, Nature Biotechnology 2004.

• Version 2.5 published in October 2007Broadening the horizon - Level 2.5 of the HUPO-PSI format for molecular interactions.Samuel Kerrien et al., BMC Biology 2007.

PSI-MI XML format

Page 16: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

MIMIxMIMIx•Experiments

•Interaction detection method (eg. Yeast two hybrid)

•Participant detection method (eg. Mass Spectrometry)

•Host organism

• Interactions

•Interactors

•Identifiers from public database

•Species of origin

•Biological/experimental roles (eg. enzyme,target / bait,prey)

•Confidence

Page 17: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

IMEx: The International Molecular Exchange Consortium

• Group of major public interaction data providers sharing curation effort: DIP, IntAct, MINT, MPact, MatrixDB, Molecular Connections, InnateDB, MPIDB and BioGRID

• Independent molecular interaction resources

• Common curation standards for detailed curation

• Common data formats (PSI-MI XML, PSICQUIC)

• Common accession number space

• Coordinated & non-redundant curation

• In production mode since February 2010

• Since 3/2009 supported by the European Commission under PSIMEx, contract number FP7-HEALTH-2007-223411, with additional partners Vital-IT, Nature, Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai)

www.imexconsortium.org

Page 18: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Tutorial

Page 19: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

IntAct: Home pagehttp://www.ebi.ac.uk/intacthttp://www.ebi.ac.uk/intact

Page 20: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

UniProt Taxonomy PubMed Method (PSI-MI CV)Interaction details Complex ?Interactors

IntAct: Search and results

IMEx dataOther PSICQUIC services

Page 21: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

IntAct: Search and results

ExportCustom columns

Filters

Page 22: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 1

• In the search panel, type the query: CDK8. How many binary interactions are returned?

•Which species are present in the results? (hint – look at Browse by taxonomy)

•How would you filter these results so that only experimentally determined pairwise interactions are displayed?

•Type the query: “transcription factor”. What types of interactor does it find (hint: click on the Lists tag).

In the search panel, type the query: chlorophyll. Click on “Change Columns Displayed” and deselect the two Aliases columns, select the First Author column, then click the Update button. What changes occur in the interactions table? Who is the first author for the “Photosystem I subunit VII-ps1a1” interaction?

Page 23: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Interaction details

Page 24: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 2

• In the search panel, type: ERK AND species:3702. Click on the details symbol for interaction 1. What is the host organism for this experiment?•Which journal was it published in, and in what year?•How many interactions in total does IntAct have from this publication? (hint – look to the right of the Publications section)

Page 25: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

The Browse tab

Page 26: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 3

• In the search panel, type: Phosphopentokinase, click on the Browse tab, then click the By UniProt taxonomy link. How many interactions are there involving only Arabidopsis proteins?

•Select the human interactions. Which interaction detection method is used for the manually curated entry?

•What is the title of the publication for this entry?

•Click on the Browse tab again, click Back to Browse Options. Click By Gene Ontology. Where do these interactions occur? (ie. which compartment)

Page 27: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Advanced search: Fields

Filtering options

Add more filtering options

Page 28: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 4

•In the search panel, type: starch. How many entries are returned?•Click on the “Show Advanced Fields” button to the right of the Quick Search box.Select the field Organism from the Pulldown menu – type in 3702 as your organism, click Add and search. How many entries are returned?Further refine the search by adding Detection method as two hybrid – does this make a difference in the number of interactions found?

Page 29: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

The List tab - Proteins

Page 30: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

List tab - Compounds

Page 31: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 5

•In the search panel, type: mitosis and click on the Lists tab.•How many proteins are found? How many small compounds?•Click on the DASTY links for various proteins. Notice how it shows features such as mutation sites, post translational modifications and binding sites.•Return to the Lists tab. Click on the Compounds sub-tab.•Click on the ChEBI link for gdp. Is it's atomic mass below 500 kilodaltons?

Page 32: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Viewing results in other resources

Page 33: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 6

• Search for: GPCR and click on the Lists tag, then click the Domains button. You get an error – why is this?

• Fix the cause of the error, and click the Domains button again. Which domain is prominant in GPCRs?

• Click on the Pathways button – which resource does this take you to? Which pathways are overrepresented?

Page 34: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Ontology search I

Page 35: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Ontology search II

Page 36: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 7

• Click on the Search tab and scroll down to the ontology section. Start to enter the word stamen slowly. What do you notice?

• How many different stamen processes does IntAct recognize?

• Which ontologies are supported by IntAct?

• Which of these ontologies know something about stamen processes?

Page 37: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Using PSICQIC services

IMEx dataOther PSICQUIC services

Page 38: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 8

• In the search panel, type the query: arabidopsis. How many binary interactions are returned?

•What is the total number of interaction evidences from other databases?

•How many interaction evidences come from IMEx databases?

•Click on the link to the IMEx hits. Which other database(s) has/have hits for this query?

•Look at the interactions from the MINT database. What information is available that is not available in IntAct?

Page 39: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Graph tab I

Page 40: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Graph tab II

Page 41: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 9

In the search panel, type: O81905, click on the Graph tab, then click the Cytoscape link. If Cytoscape does not start, ask your neighbour – not all computers have the permissions to do this.•On the left hand side of the Cytoscape window, select the VizMapper tab.•Under the drop down list ‘Current Visual Style’ choose ‘Sample 1’•Expand the edge color node, set detection method to discrete mapping.• To color interactions by detection method, right click and choose Generate discrete values → Rainbow 1.•Now experiment with other features of this visualization tool!

Page 42: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Answers!

Exercise 1

• 55 interactions returned

• Look at browse by taxonomy, lists species in results

• Filter out spoke-expanded queries, leaves 12 results

• Finds proteins, chemical compounds and nucleic acids

• Naver et al (2001)

Page 43: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 2

• Host organism is yeast

• Proc Natl Acad Sci USA, 2008

• 8 interactions

Page 44: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 3

• 2 interactions from Arabidopsis

• Enzymatic study

• PubMed 15352244 New targets of Arabidopsis thioredoxins revealed by proteomic analysis

• The apoplast compartment

Page 45: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 4

• 326 entries

• When you add Arabidopsis, leaves 19 results

• Add two hybrid detection method, leaves 15 entries

Page 46: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 5

• 5547 proteins

• 23 small compounds

• Mass 443.2

Master headline

Page 47: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 6

• Error – need to select some or all of the list before selecting button

• The 7TM domain is prominent in GPCRs

• Reactome

• GPCR signalling

Master headline

Page 48: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 7

• The word stamen auto-completes

• 4 stamen processes are recognised

• Gene Ontology, PSI-MI, ChEBI, UniProt Taxonomy and InterPro

• GO

Master headline

Page 49: IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular

Exercise 8

• 225 interactions found

• 106,771 interaction evidences (PSICQUIC)

• 663 interaction evidences from 2 other IMEx databases

• DIP and MINT databases have hits for this query

• Confidence value in MINT

Master headline