protein-protein interactions protein analysis workshop 2010 bioinformatics group institute of...

36
Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsin ki.fi

Post on 19-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Protein-protein interactions

Protein Analysis Workshop 2010

Bioinformatics groupInstitute of BiotechnologyUniversity of helsinki

Hung Ta

[email protected]

Page 2: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Outline

Why are protein-protein interactions (PPIs) so important?.

Experimental methods (high throughput) for discovering PPIs:

• Yeast-two-hybrid.

• AP-MS.

PPIs databases: DIP, Biogrid, Intact, HPRD…

Computational prediction of PPIs

• Genomics methods

• Biological context methods

• Integrative methods

• STRING (EMBL)

Page 3: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Why are PPIs so important?

Gene is the basic unit of heredity. Genomes are availabe.

genome proteome interactome

Proteins, the working molecules of a cell, carry out many biological activities

Proteins function by interacting with other proteins, DNA, RNA, small molecules.

Page 4: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

P

2

P1

P3

P4

P5

PN

X

YSearch for drug

molecules:

The body produces a list of proteins: P1, P2, P3,… PN. A pathogen (virus or bacteria) enters the body and produces its own protein, say X.

X interacts with one of proteins, say P1, inhibiting it from its routine activities.Diseases emerge

Introduce into the body a new molecule, Y such that X is more attracted to Y than to P1, freeing P1 to get back to routine work.

Page 5: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Search for drug molecules:

Bring out an effective drug

into the market could: Take 10-15 years

Cost up to US$800 million

Test up to 30,000 candidate

molecules

Databases of molecules interactions or linkages could help to

cut down the search for drug molecules.

Page 6: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

The types of PPIs

Binary (physical) interactions: refer to the binding

between two proteins whose residues are in contact at

some point in time.

Funtional linkages: implicate pairwise relationships

between proteins that work together (participate in a

common structural complex or pathway) to implement

biological tasks.

Page 7: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Protein physical interactomes and functional linkage maps are available for

S. cerevisiae (Uetz et al. 2000; Ito et al. 2001; Ho et al. 2002; Gavin

et al. 2002, 2006; Krogan et al. 2006, Tarassov et al. 2008; Yu et al.

2008)

E. coli (Butland et al. 2005; Arifuzzaman et al. 2006)

C. elegans (Li et al. 2004)

D. melanogaster (Giot et al. 2003)

Humans (Rual et al. 2005; Stelzl et al. 2005, Ewing et al. 2007)

Page 8: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

High throughput experimental methods for discovering PPIs

Yeast-two-hybrid (Y2H)

Ito T. et al., 2001; Uetz P. et al., 2000; Yu H. et al., 2008

Rual et al. 2005; Stelzl et al. 2005

Affinity purification followed by mass spectrometry (AP-

MS).

Gavin AC et al., 2002, 2006

Ho Y. et al., 2002

Krogan NJ et al., 2006

Page 9: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Y2H experiments

Idea: Use a protein of interest as bait in order to

discover proteins that physically interact with

the bait protein; these are called prey.

A single transcription factor is cut into two

pieces called Binding Domain (BD) and

Activation Domain (AD). Bait (prey) protein is

fused to the BD (AD).

If bait and prey proteins interact, the

transcription of the reporter gene is initiated.

High throughput screening the interactions

between the bait and the prey library.

Page 10: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

AP-MS experiments

Fuse a TAP tag consisting of protein A and

calmodulin binding peptide separated by TEV

protease cleavage site to the target protein

After the first AP step using an IgG matrix,

many contaminants are eliminated.

In the second AP step, CBP binds tightly to

calmodulin coated beads. After washing which

removes remained contaminants and the TEV

protease, the bound meterial is released under

mild condition with EGTA.

Proteins are identified by mass spectrometry

Page 11: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Data output by MS is lists including bait

protein and its co-purified partners

(preys); each accompanied by a

reliability score.

Use a scoring system combining

spokes and matrix models to generate

a network of binary PPIs. Each

interaction has a confidence score

Eliminate low scoring links to obtain

high confident network.

The network is partitioned into densely

connected regions, which are named

complexes.

AP-MS experiments

Page 12: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Computational methods of prediction

Comparative Genomic methods

Gene neighbourhood

Gene fusion

Domain-based method

Phylogenetic

Intergrative methods

Biological context methods

Co-expression

GO

Text mining

Page 13: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Gene neighbourhood based method

Protein a and b whose genes are close in different genomes are predicted

to interact.

Dandekar, T. et al. (1998). Conservation of gene order: A fingerprint of proteins that physically interact. Trends in Biochemical Sciences, 23(9), 324–328

Page 14: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Gene fusion (Rosetta stone)

Protein a and b are predicted to interact if they combine (fuse) to form one

protein in another organism.

Enright, A. Jet al. (1999). Protein interaction maps for complete genomes based ongene fusion events. Nature, 402(6757), 86–90.

Page 15: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Domain based methods

Well-known experimental

PPIs data

Inferred domain-domain interactions (DDIs) Interact/Non-interact

Protein BProtein A

AS, MLE, PE

AS: association; MLE: Maximum Likehood Estimation; PE: Parsimony Explanation

Validation of inferred DDIs remains difficult due to lack of sufficient

and unbias benchmark datasets.

The methods show limited performance at predicting PPIs.H.X. Ta, L. Holm, Biochem. Biophys. Res. Commun. (2009)

Page 16: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Phylogeny based methods

Protein a and c are predicted to interact if they have similar phylogenetic

profiles.

Pellegrini, M. et al. (1999). Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. PNAS, 96(8), 4285–4288

Page 17: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Biological context methods

Gene expression: Two protein whose genes exhibit very similar

patterns of expression across multiple states or experiments

may then be considered candidates for functional association

and possibly direct physical interaction.

GO (Gene Ontology) annotations: two interacting proteins likely

have the same GO term annotations.

Text-mining: Extract interacting protein information from

literature (PubMed..): ”is protein K mentioned with protein I in

publications”

The techniques are used to validate PPIs discovered by other

approaches or are integrated with others in integrative approaches.

Page 18: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Integrative methods

Naive bayes

Random Forest

Decision Tree

Kernels

Logistic Regression

Support Vector Machines

Jansen R. et al., Science 2003

Bader J.S. et al., Nat Biotech 2004

Lin N. et al., BMC Bioinformatics 2004

Zhang L. et al., BMC Bioinformatics 2004

Page 19: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Databases of PPIs DIP(http://dip.doe-mbi.ucla.edu)

71,275 interactions 23,200 proteins 372 organisms

BioGRID (http://www.thebiogrid.org) 247,366 non-redundant interactions 31,254  unique proteins 17 organisms

IntAct (http://www.ebi.ac.uk/intact) 232,793 interactions 69,335 proteins

MINT (http://mint.bio.uniroma2.it) 89,956 interactions 31,631 proteins

SGD (http://www.yeastgenome.org)Saccharomyces Genome Database

HPRD (http://www.hprd.org/) 39,194 interactions 30,047proteins

MIPs: interactions, complexes

STRING: Known and Predicted Protein-Protein Interactions

Page 20: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

DIP

Protein function Protein-protein relationship Evolution of protein-protein interaction The network of interacting proteins Unknown protein-protein interaction The best interaction conditions

Page 21: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

DIP-Searching information

Page 22: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Find information about your protein

Page 23: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

DIP Node (DIP:1143N)

Page 24: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Graph of PPIs around DIP:1143N

Nodes are proteins

Edges are PPIs

The center node is DIP:1143N

Edge width encodes the number

of independent experiments

identyfying the interaction.

Green (red) is used to draw core

(unverified) interactions.

Click on each node (edge) to

know more about the protein

(interaction).

Page 25: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

List of interacting partners of DIP:1143N

Page 26: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

STRING: Search Tool for the Retrieval of Interacting Genes/Proteins

A database of known and predicted protein interactions Direct (physical) and indirect (functional) associations The database currently covers 2,590,259 proteins from 630

organisms Derived from these sources:

Supported by

Page 27: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Searching information

Query infomation via protein names or protein sequences.

Page 28: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Graph of PPIs

Nodes are proteins

Lines with color is an evidence of

interaction between two proteins.

The color encodes the method

used to detect the interaction.

Click on each node to get the

information of the corresponding

protein.

Click on each edge to get

information of the interaction

between two proteins.

Page 29: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

List of predicted partners

Partners with discription and confidence score. Choose different types of views to see more detail

Page 30: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Neighborhood View

The red block is the queried protein and others are its neighbors in organisms. Click on the blocks to obtain the information about corresponding proteins.

The close organisms show the similar protein neighborhood patterns. Help to find out the close genes/proteins in genomic region.

Page 31: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Occurence Views

Represents phylogenetic profiles of proteins. Color of the boxes indicates the sequence similarity between the proteins and

their homologus protein in the organisms. The size of box shows how many members in the family representing the

reported sequence similarity. Click on each box to see the sequence alignment.

Page 32: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Gene Fusion View

This view shows the individual gene fusion events per species Two different colored boxes next to each other indicate a fusion

event. Hovering above a region in a gene gives the gene name; clicking on

a gene gives more detailed information

Page 33: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

References Skrabanek L, Saini HK, Bader GD, Enright AJ. Computational prediction of

protein-protein interactions. Methods Mol Biol. 2004;261:445-68 Benjamin A. Shoemaker, Anna R. Panchenko. Deciphering Protein–Protein

Interactions. Part I. Experimental Techniques and Databases. PLoS Comput Biol 3(3): e42. doi:10.1371/journal.pcbi.0030042

Benjamin A. Shoemaker, Anna R. Panchenko. Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners. PLoS Comput Biol 3(4): e43. doi:10.1371/journal.pcbi.0030043

Pitre S, Alamgir M, Green JR, Dumontier M, Dehne F, Golshani A. Computational methods for predicting protein-protein interactions. Adv Biochem Eng Biotechnol. 2008;110:247-67.

Wodak SJ, Pu S, Vlasblom J, Séraphin B. Challenges and rewards of interaction proteomics. Mol Cell Proteomics. 2009 Jan;8(1):3-18

Yanjun Qi, Ziv Bar-joseph, Judith Klein-seetharaman. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. PROTEINS: Structure, Function, and Bioinformatics. 63(3):490-500

Page 34: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Why protein-protein interactions (PPI)?

PPIs are involved in many biological processes: Signal transduction

Protein complexes or molecular machinery.

Protein carrier.

Protein modifications (phosphorylation)

PPIs help to decipher the molecular mechanisms

underlying the biological functions, and enhance the

approaches for drug discovery

Page 35: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Assessment of large–scale datasets of PPIs

Yu H, et al. (2008). Science 322: 104-110

Benchmarking high-throughput interactions: Y2H: Uetz et al. 2000; Ito et al. 2001

AP-MS: Gavin et al. 2006; Krogan et al. 2006

Binary gold standard (GS): positive reference set (PRS)

and random reference set (RRS).

MIPs co-complex gold standard.

Measure large-scale datasets against Binary-GS and

MIPs-GS

Page 36: Protein-protein interactions Protein Analysis Workshop 2010 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta xuanhung.ta@helsinki.fi

Assessment of large–scale datasets of PPIs

Yu H, et al. (2008). Science 322: 104-110

AP/MS performs well at detecting co-complex associations according to

MIPs

Y2H performs well at detecting binary interactions according to Binary-GS

Y2H

AP/MS