EBI is an Outstation of the European Molecular Biology Laboratory.
Jyoti Khadake & Vicky Schneider
Joint Wellcome Trust –EBI
Summer School
24th June 2011
Protein interactions and Pathways
This morning session outline• Where do protein sequences come from?
• Introduction to protein databases
• Introduction to protein interactions
• Standardisation of the protein interaction data
• IntAct and demo
• Psicquic/Cytoscape & demo
• Data visualisation and network building-
• Including the Protein information from other sources to enhance networks
2
Where do protein sequences come from?
3
Protein Sequences
Protein databases•Based on nucleotide sequence similarity•Based on peptide sequences
Organism database•Organism of protein is important as is sequence – taxonomy databases
Can you name THE database of protein sequences?
5
UniProtKB
factsheet
Let’s explore a protein: CDC42
6
• Cell division control protein 42 homolog also known as CDC42 is a protein involved in regulation of the cell cycle.
• It is a small GTPase of the Rho-subfamily, which regulates signaling pathways that control diverse cellular functions including cell morphology, migration, endocytosis and cell cycle progression.
What could go wrong if CDC42 is not doing its job?
UniProtKB (CDC42 protein)
• Search for gene - CDC42• Check the different proteins retrieved
• Organisms• Same organism swissprot/trembl• Different referenced databases - PRIDE
• Sequences and References• Information about protein
Where is it present, how does it act, what are its properties…• INTACT, REACTOME, GOA, INTERPRO, PDB
How are TREMBL entries generated?
Master headline
UniProt Knowledge Base• Swiss-Prot: Manual annotations (~450,000
proteins)• TrEMBL: Automatic (~3,300,000 proteins)
htt
p:/
/ww
w.e
bi.
un
ipro
t.org
/h
ttp
://w
ww
.eb
i.u
nip
rot.
org
/
Master headline
UniProt Knowledge Base
• Interactions in IntAct are using Splice Variants
htt
p:/
/ww
w.e
bi.
un
ipro
t.org
/h
ttp
://w
ww
.eb
i.u
nip
rot.
org
/
Master headline
UniProt Knowledge Base• Summary:
• Master Protein: P60953• Splice variants / Isoform: P60953-1, P60953-2
htt
p:/
/ww
w.e
bi.
un
ipro
t.org
/h
ttp
://w
ww
.eb
i.u
nip
rot.
org
/
!
11
UniProt Knowledge Base
Protein Families, domains and motifs
12
What is a Protein families?Protein domain? And protein motifs?
Why to bother creating a db that groups proteins that share the same domain?
13
InterPro
Protein Families, domains (and motifs)factsheet
Master headline
UniProt Knowledge Base• Summary:
• Master Protein: P60953• Interaction and pathway databases
htt
p:/
/ww
w.e
bi.
un
ipro
t.org
/h
ttp
://w
ww
.eb
i.u
nip
rot.
org
/
!
Master headline
UniProt TaxonomyUniProt Taxonomy• Web Interface to the NCBI taxonomy
Master headline
Newt
PRIDE: where is the data coming from.
18
PRIDE
factsheet
EBI is an Outstation of the European Molecular Biology Laboratory.
Protein interactions
Interactions
• Basis of protein action• Types
• Self• Binary: homomeric or heteromeric• N-nary complexes• Co-localisations
• Biological types of interactions• Information in literature and websites
2. Association
3. Functional Interaction
Types of Interaction data in IntAct
1. Direct interactions
In pairs start the next activity:Match the types of experimental techniques (you can find information in the cards provided) with the type of interactions Jyoti just explained :
Direct Interactions
Association
Functional Interaction
Standardisation of the protein interaction data
23
Ontologies
factsheet
www.ebi.ac.uk/ols for controlled vocabularies
25
Format for storage and exchange –
PSI-MI XML 2.5
Interaction DatabasesDeep Curation
IntAct – active curation, broad species coverage, all molecule typesMINT – active curation, broad species coverage, PPIsDIP – active curation, broad species coverage, PPIsMPACT - ? curation, limited species coverage, PPIsMatrixDB – active curation, extracellular matrix molecules onlyBIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated
Shallow curationBioGRID – active curation, limited number of model organismsHPRD – active curation, human-centric, modelled interactionsMPIDB – active curation, microbial interactions
The IMEx consortium
27
EBI is an Outstation of the European Molecular Biology Laboratory.
IntAct
29
Interaction2
Interaction4
Interaction1
Interaction3
Publication
Experiment1
Experiment2
. Roles
. Features
. Preparations
Part
icip
an
t
How to model an interaction
Protein1
Protein2
Participant1
Participant2
Participant3
30
Main objects - Experiment
Controlled by Ontologies
Literature references
Confidence measures
31
Main objects - Participant
e.g. enzyme target
Interactor
e.g. bait, prey
Delivery methodexpression level…
Interactor used experimentally
Building of Complex
IntAct
• Search MITab• From MiTab to detailed view• Expanding network• Network view - TBC
• Other data that can be visualised
Master headline
IntAct – Home Pageh
ttp
://w
ww
.eb
i.ac.u
k/i
nta
ct
htt
p:/
/ww
w.e
bi.
ac.u
k/i
nta
ct
Master headline
Software demonstrationSoftware demonstration
• Many ways to search data !
• Simple, yet powerful search engine
• Advanced search – how to build complex queries
• Searching by ontology terms
• Searching by chemical substructure
Master headline
Simple Search
Fir
st
searc
h f
rom
th
e h
om
e p
ag
e…
Fir
st
searc
h f
rom
th
e h
om
e p
ag
e…
UniProt Taxonomy PubMed OLSDetails of interaction Complex ?
Master headline
Downloading & Customizing
Fir
st
searc
h f
rom
th
e h
om
e p
ag
e…
Fir
st
searc
h f
rom
th
e h
om
e p
ag
e…
!
Master headline
Searching –more
How
to b
uild
com
ple
x q
ueri
es…
How
to b
uild
com
ple
x q
ueri
es…
Master headline
Searching – Fields
How
to b
uild
com
ple
x q
ueri
es…
How
to b
uild
com
ple
x q
ueri
es…
• Unsure how to build your own complex query ?
Master headline
Searching – Searching – FieldsFields
How
to b
uild
com
ple
x q
ueri
es…
How
to b
uild
com
ple
x q
ueri
es…
• Some fields provide easy ways to select terms
Master headline
Software demonstrationSoftware demonstration
• Single interaction details
• Selecting an interaction
• Looking at the details
• Fetching all other interaction reported in the same paper
• Searching for similar interactions in the database
Master headline
Interaction Details
Sele
cti
ng
an
in
tera
cti
on
…S
ele
cti
ng
an
in
tera
cti
on
…
Master headline
Interaction Details
Lookin
g a
t th
e d
eta
ils…
Lookin
g a
t th
e d
eta
ils…
Master headline
Interaction Details
Lookin
g a
t th
e d
eta
ils…
Lookin
g a
t th
e d
eta
ils…
Master headline
Interaction Details
Searc
hin
g f
or
sim
ilar
inte
racti
on
s…
Searc
hin
g f
or
sim
ilar
inte
racti
on
s…
EBI is an Outstation of the European Molecular Biology Laboratory.
Network VisualisationPSICQUIC Cytoscape
Network visualisation
• In IntAct • From IntAct Binary and expanded• From IntAct N-nary and expanded
Important: type of interaction and method used• In Psicquic
• Data from other interaction databases
What is PSICQUIC ?
• Proteomics Standards Initiative Common QUery InterfaCe.
• Community effort to standardise the way to access and retrieve data from Molecular Interaction databases.
• PSICQUIC is a specification of a web service.
• Resources already implementing PSICQUIC are listed in a registry.
• Based on the PSI standard formats (XML and MITAB)
• Documentation: http://psicquic.googlecode.com
PSICQUIC implementation
….…. ….....
….…. ….....
PSICQUIC PSICQUIC PSICQUIC
Sample
Observation error
Interaction databases
Publications
PSICQUIC sources
Annotation error
User
PSICQUIC Registry
PSICQUIC client
http://www.ebi.ac.uk/Tools/webservices/psicquic/view/http://www.ebi.ac.uk/Tools/webservices/psicquic/view/
PSICQUIC View
http://bit.ly/psicquic-viewhttp://bit.ly/psicquic-view
• Enables clustering of queries across providers,• Visualization of graphical network• Linking back to the original source for more details• …
PSICQUIC Services Tagging
Contentprotein-proteinsmall molecule-proteinnucleic acid-protein
Interaction representation
evidenceclustered
Curation standards
mimix curationimex curationrapid curation
Sourceinternally curatedtext miningpredictedimported
Complex expansionspokematrixbipartite
PSICQUIC View
How to deal with Complexes
• Some experimental protocol do generate complex data:Eg. Tandem affinity purification (TAP)
• One may want to convert these complexes into sets of binary interactions, 2 algorithms are available:
In pairs start the next activity:
Binary or N-nary? Spoke or Matrix?
Please identify the type of interaction for the interaction method cards given
Also choose the method you think is best for the method
Software demonstrationSoftware demonstration
• Visualising network in Cytoscape
• Selecting an network
• Import in cytoscape
• Change layout
• Add attributes and change view based on these
• Change and add properties to nodes and edges
Cytoscape network visualisation
N
etw
ork
N
etw
ork
VisualizationH
igh
lig
hti
ng
netw
ork
layou
t…H
igh
lig
hti
ng
netw
ork
layou
t…
Master headline
VisualizationH
igh
lig
hti
ng
netw
ork
pro
pert
ies e
dg
es
Hig
hlig
hti
ng
netw
ork
pro
pert
ies e
dg
es
Master headline
VisualizationH
igh
lig
hti
ng
netw
ork
pro
pert
ies n
od
es
Hig
hlig
hti
ng
netw
ork
pro
pert
ies n
od
es
Attributes and analysis using Cytoscape
59
Master headline
What else?
How
to look d
eep
er
into
a d
ata
set…
How
to look d
eep
er
into
a d
ata
set…
Master headline
GOAH
ow
to look d
eep
er
into
a d
ata
set…
How
to look d
eep
er
into
a d
ata
set…
• Click on the interaction count to restrict your dataset
• This operation can be done several time to add multiple filters
Improving and increasing protein annotations
63
EBI is an Outstation of the European Molecular Biology Laboratory.
IntAct team
Rolf Apweiler•Henning Hermjakob•Sandra Orchard•Margaret Duesbury•Samuel Kerrien•Bruno Aranda•Marine Dumousseau
IntA
ct
is f
un
ded
by t
he E
uro
pean
Com
mis
sio
n u
nd
er
FELIC
S,
con
tract
nu
mb
er
021902 (
RII
3)
PSI, IMEx, Enfin Proteomics community
PANDA
Proteomics
Acknowledgements
What data are we dealing with ?
System Biology?System Biology?
Genomics Proteomics
Functional Genomics/Proteomics
TranscriptomicsMetabolomics
DNA
RNA
Protein
Small Molecules
Databases
Pathways:
This afternoon session outline• Reactome Overview• What type of data it contains• Where the data comes from• What and how can you access through Reactome• Have a go: tutorial
67
A Database of human biological
pathways
Steve Jupe
Rationale – Journal information
Nature 407(6805):770-6.The Biochemistry of Apoptosis.
“Caspase-8 is the key initiator caspase in the death-receptor pathway. Upon ligand binding, death receptors such as CD95 (Apo-1/Fas) aggregate and form membrane-bound signalling complexes (Box 3). These complexes then recruit, through adapter proteins, several molecules of procaspase-8, resulting in a high local concentration of zymogen. The induced proximity model posits that under these crowded conditions, the low intrinsic protease activity of procaspase-8 (ref. 20) is sufficient to allow the various proenzyme molecules to mutually cleave and activate each other (Box 2). A similar mechanism of action has been proposed to mediate the activation of several other caspases, including caspase-2 and the nematode caspase CED-3 (ref. 21).”
How can I access the pathway described here and reuse it?
Nature. 2000 Oct 12;407(6805):770-6.The biochemistry of apoptosis.
Rationale - FiguresA picture paints a thousand words…
but….• Just pixels• Omits key details• Assumes• Fact or Hypothesis?
Reactome is…
Free, online, open-source curateddatabase of pathways and reactions in human biology
Authored by expert biologists, maintained byReactome editorial staff (curators)
Mapped to cellular compartment
Extensively cross-referenced
Tools for data analysis – Pathway Analysis, Expression Overlay, Species Comparison, Biomart…
Used to infer orthologous events in 20 non-human species
Reactome is…
human
PMID:5555 PMID:4444
mouse
cow
Direct evidence
Direct evidence
Indirect evidence
PMID:8976
PMID:1234
Using model organism data to build pathways – Inferred pathway events
Theory - Reactions
Pathway steps = the “units” of Reactome
= events in biology
TRANSPORTCLASSIC
BIOCHEMICAL
BINDING
DISSOCIATION
DEGRADATION
PHOSPHORYLATION
DEPHOSPHORYLATION
Reaction Example 1: Enzymatic
Reaction Example 2: Transport
REACT_945.4
Transport of Ca++ from platelet dense tubular system to cytoplasm
Other Reaction Types
Binding
Dimerization
Phosphorylation
Reactions Connect into Pathways
OUTPUTINPUT
CATALYST
OUTPUTINPUT
CATALYST
INPUT OUTPUT
CATALYS
T
Data Expansion - Link-outs From Reactome
• GO • Molecular Function• Compartment• Biological process
• KEGG, ChEBI – small molecules• UniProt – proteins• Sequence dbs – Ensembl, OMIM, Entrez Gene,
RefSeq, HapMap, UCSC, KEGG Gene• PubMed references – literature evidence for events
Species Selection
Data Expansion – Projecting to Other Species
A + ATP A + ADP-PB
Human
A + ATP A + ADP-P
BMouse
BA
Drosophila
Reaction notinferred
No orthologue - Protein not inferred
+ ATP
Exportable Protein-Protein Interactions
Inferred from complexes and reactions
Interactions between proteins in the same complex, reaction, or adjoining reaction
Lists available from Downloads
See Readme document for more details
Coverage – Content, TOC
And many more...
Planned Coverage – Editorial Calendar
Reactome Tools
• Interactive Pathway Browser
• Pathway Mapping and Over-representation
• Expression overlay onto pathways
• Molecular Interaction overlay
• Biomart
Summary• Pathway databases are an integral part of the scientific enterprise.
• Reactome has deployed a user-friendly web site that promotes
integrated research on pathways and networks.• Data visualization
• Data analysis
• Data expansion
• Data integration
• Data standards/exports
• Develop and distribute open software and standard operating
procedures for the management of pathway information.
Credits
OICR/CSHL NYU EBI
Lincoln Stein Peter D'Eustachio Ewan Birney
Michael Caudy Shahana Mahajan Henning Hermjakob
Marc Gillespie Lisa Matthews David Croft
Robin Haw Veronica Shamovsky Phani Garapati
Irina Kalatskaya Bijay Jassal
Bruce May Steven Jupe
Leontius Pradhana
Nelson Ndegwa
Guanming Wu Gavin O’Kelly
Christina Yung Esther Schmidt
Supported by grants from the US National Institutes of Health (P41 HG003751) and EU grant LSHG-CT-2005-518254 "ENFIN”
In pairs start the Reactome Tutorial