protein-protein interactions protein analysis workshop 2010 bioinformatics group institute of...
Post on 19-Dec-2015
215 Views
Preview:
TRANSCRIPT
Protein-protein interactions
Protein Analysis Workshop 2010
Bioinformatics groupInstitute of BiotechnologyUniversity of helsinki
Hung Ta
xuanhung.ta@helsinki.fi
Outline
Why are protein-protein interactions (PPIs) so important?.
Experimental methods (high throughput) for discovering PPIs:
• Yeast-two-hybrid.
• AP-MS.
PPIs databases: DIP, Biogrid, Intact, HPRD…
Computational prediction of PPIs
• Genomics methods
• Biological context methods
• Integrative methods
• STRING (EMBL)
Why are PPIs so important?
Gene is the basic unit of heredity. Genomes are availabe.
genome proteome interactome
Proteins, the working molecules of a cell, carry out many biological activities
Proteins function by interacting with other proteins, DNA, RNA, small molecules.
P
2
P1
P3
P4
P5
PN
X
YSearch for drug
molecules:
The body produces a list of proteins: P1, P2, P3,… PN. A pathogen (virus or bacteria) enters the body and produces its own protein, say X.
X interacts with one of proteins, say P1, inhibiting it from its routine activities.Diseases emerge
Introduce into the body a new molecule, Y such that X is more attracted to Y than to P1, freeing P1 to get back to routine work.
Search for drug molecules:
Bring out an effective drug
into the market could: Take 10-15 years
Cost up to US$800 million
Test up to 30,000 candidate
molecules
Databases of molecules interactions or linkages could help to
cut down the search for drug molecules.
The types of PPIs
Binary (physical) interactions: refer to the binding
between two proteins whose residues are in contact at
some point in time.
Funtional linkages: implicate pairwise relationships
between proteins that work together (participate in a
common structural complex or pathway) to implement
biological tasks.
Protein physical interactomes and functional linkage maps are available for
S. cerevisiae (Uetz et al. 2000; Ito et al. 2001; Ho et al. 2002; Gavin
et al. 2002, 2006; Krogan et al. 2006, Tarassov et al. 2008; Yu et al.
2008)
E. coli (Butland et al. 2005; Arifuzzaman et al. 2006)
C. elegans (Li et al. 2004)
D. melanogaster (Giot et al. 2003)
Humans (Rual et al. 2005; Stelzl et al. 2005, Ewing et al. 2007)
…
High throughput experimental methods for discovering PPIs
Yeast-two-hybrid (Y2H)
Ito T. et al., 2001; Uetz P. et al., 2000; Yu H. et al., 2008
Rual et al. 2005; Stelzl et al. 2005
Affinity purification followed by mass spectrometry (AP-
MS).
Gavin AC et al., 2002, 2006
Ho Y. et al., 2002
Krogan NJ et al., 2006
Y2H experiments
Idea: Use a protein of interest as bait in order to
discover proteins that physically interact with
the bait protein; these are called prey.
A single transcription factor is cut into two
pieces called Binding Domain (BD) and
Activation Domain (AD). Bait (prey) protein is
fused to the BD (AD).
If bait and prey proteins interact, the
transcription of the reporter gene is initiated.
High throughput screening the interactions
between the bait and the prey library.
AP-MS experiments
Fuse a TAP tag consisting of protein A and
calmodulin binding peptide separated by TEV
protease cleavage site to the target protein
After the first AP step using an IgG matrix,
many contaminants are eliminated.
In the second AP step, CBP binds tightly to
calmodulin coated beads. After washing which
removes remained contaminants and the TEV
protease, the bound meterial is released under
mild condition with EGTA.
Proteins are identified by mass spectrometry
Data output by MS is lists including bait
protein and its co-purified partners
(preys); each accompanied by a
reliability score.
Use a scoring system combining
spokes and matrix models to generate
a network of binary PPIs. Each
interaction has a confidence score
Eliminate low scoring links to obtain
high confident network.
The network is partitioned into densely
connected regions, which are named
complexes.
AP-MS experiments
Computational methods of prediction
Comparative Genomic methods
Gene neighbourhood
Gene fusion
Domain-based method
Phylogenetic
Intergrative methods
Biological context methods
Co-expression
GO
Text mining
Gene neighbourhood based method
Protein a and b whose genes are close in different genomes are predicted
to interact.
Dandekar, T. et al. (1998). Conservation of gene order: A fingerprint of proteins that physically interact. Trends in Biochemical Sciences, 23(9), 324–328
Gene fusion (Rosetta stone)
Protein a and b are predicted to interact if they combine (fuse) to form one
protein in another organism.
Enright, A. Jet al. (1999). Protein interaction maps for complete genomes based ongene fusion events. Nature, 402(6757), 86–90.
Domain based methods
Well-known experimental
PPIs data
Inferred domain-domain interactions (DDIs) Interact/Non-interact
Protein BProtein A
AS, MLE, PE
AS: association; MLE: Maximum Likehood Estimation; PE: Parsimony Explanation
Validation of inferred DDIs remains difficult due to lack of sufficient
and unbias benchmark datasets.
The methods show limited performance at predicting PPIs.H.X. Ta, L. Holm, Biochem. Biophys. Res. Commun. (2009)
Phylogeny based methods
Protein a and c are predicted to interact if they have similar phylogenetic
profiles.
Pellegrini, M. et al. (1999). Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. PNAS, 96(8), 4285–4288
Biological context methods
Gene expression: Two protein whose genes exhibit very similar
patterns of expression across multiple states or experiments
may then be considered candidates for functional association
and possibly direct physical interaction.
GO (Gene Ontology) annotations: two interacting proteins likely
have the same GO term annotations.
Text-mining: Extract interacting protein information from
literature (PubMed..): ”is protein K mentioned with protein I in
publications”
The techniques are used to validate PPIs discovered by other
approaches or are integrated with others in integrative approaches.
Integrative methods
Naive bayes
Random Forest
Decision Tree
Kernels
Logistic Regression
Support Vector Machines
Jansen R. et al., Science 2003
Bader J.S. et al., Nat Biotech 2004
Lin N. et al., BMC Bioinformatics 2004
Zhang L. et al., BMC Bioinformatics 2004
Databases of PPIs DIP(http://dip.doe-mbi.ucla.edu)
71,275 interactions 23,200 proteins 372 organisms
BioGRID (http://www.thebiogrid.org) 247,366 non-redundant interactions 31,254 unique proteins 17 organisms
IntAct (http://www.ebi.ac.uk/intact) 232,793 interactions 69,335 proteins
MINT (http://mint.bio.uniroma2.it) 89,956 interactions 31,631 proteins
SGD (http://www.yeastgenome.org)Saccharomyces Genome Database
HPRD (http://www.hprd.org/) 39,194 interactions 30,047proteins
MIPs: interactions, complexes
STRING: Known and Predicted Protein-Protein Interactions
DIP
Protein function Protein-protein relationship Evolution of protein-protein interaction The network of interacting proteins Unknown protein-protein interaction The best interaction conditions
DIP-Searching information
Find information about your protein
DIP Node (DIP:1143N)
Graph of PPIs around DIP:1143N
Nodes are proteins
Edges are PPIs
The center node is DIP:1143N
Edge width encodes the number
of independent experiments
identyfying the interaction.
Green (red) is used to draw core
(unverified) interactions.
Click on each node (edge) to
know more about the protein
(interaction).
List of interacting partners of DIP:1143N
STRING: Search Tool for the Retrieval of Interacting Genes/Proteins
A database of known and predicted protein interactions Direct (physical) and indirect (functional) associations The database currently covers 2,590,259 proteins from 630
organisms Derived from these sources:
Supported by
Searching information
Query infomation via protein names or protein sequences.
Graph of PPIs
Nodes are proteins
Lines with color is an evidence of
interaction between two proteins.
The color encodes the method
used to detect the interaction.
Click on each node to get the
information of the corresponding
protein.
Click on each edge to get
information of the interaction
between two proteins.
List of predicted partners
Partners with discription and confidence score. Choose different types of views to see more detail
Neighborhood View
The red block is the queried protein and others are its neighbors in organisms. Click on the blocks to obtain the information about corresponding proteins.
The close organisms show the similar protein neighborhood patterns. Help to find out the close genes/proteins in genomic region.
Occurence Views
Represents phylogenetic profiles of proteins. Color of the boxes indicates the sequence similarity between the proteins and
their homologus protein in the organisms. The size of box shows how many members in the family representing the
reported sequence similarity. Click on each box to see the sequence alignment.
Gene Fusion View
This view shows the individual gene fusion events per species Two different colored boxes next to each other indicate a fusion
event. Hovering above a region in a gene gives the gene name; clicking on
a gene gives more detailed information
References Skrabanek L, Saini HK, Bader GD, Enright AJ. Computational prediction of
protein-protein interactions. Methods Mol Biol. 2004;261:445-68 Benjamin A. Shoemaker, Anna R. Panchenko. Deciphering Protein–Protein
Interactions. Part I. Experimental Techniques and Databases. PLoS Comput Biol 3(3): e42. doi:10.1371/journal.pcbi.0030042
Benjamin A. Shoemaker, Anna R. Panchenko. Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners. PLoS Comput Biol 3(4): e43. doi:10.1371/journal.pcbi.0030043
Pitre S, Alamgir M, Green JR, Dumontier M, Dehne F, Golshani A. Computational methods for predicting protein-protein interactions. Adv Biochem Eng Biotechnol. 2008;110:247-67.
Wodak SJ, Pu S, Vlasblom J, Séraphin B. Challenges and rewards of interaction proteomics. Mol Cell Proteomics. 2009 Jan;8(1):3-18
Yanjun Qi, Ziv Bar-joseph, Judith Klein-seetharaman. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. PROTEINS: Structure, Function, and Bioinformatics. 63(3):490-500
Why protein-protein interactions (PPI)?
PPIs are involved in many biological processes: Signal transduction
Protein complexes or molecular machinery.
Protein carrier.
Protein modifications (phosphorylation)
…
PPIs help to decipher the molecular mechanisms
underlying the biological functions, and enhance the
approaches for drug discovery
Assessment of large–scale datasets of PPIs
Yu H, et al. (2008). Science 322: 104-110
Benchmarking high-throughput interactions: Y2H: Uetz et al. 2000; Ito et al. 2001
AP-MS: Gavin et al. 2006; Krogan et al. 2006
Binary gold standard (GS): positive reference set (PRS)
and random reference set (RRS).
MIPs co-complex gold standard.
Measure large-scale datasets against Binary-GS and
MIPs-GS
Assessment of large–scale datasets of PPIs
Yu H, et al. (2008). Science 322: 104-110
AP/MS performs well at detecting co-complex associations according to
MIPs
Y2H performs well at detecting binary interactions according to Binary-GS
Y2H
AP/MS
top related