protein-protein interactions protein analysis workshop 2010 bioinformatics group institute of...

Protein-protein interactions

Protein Analysis Workshop 2010

Bioinformatics groupInstitute of BiotechnologyUniversity of helsinki

Hung Ta

xuanhung.ta@helsinki.fi

Outline

Why are protein-protein interactions (PPIs) so important?.

Experimental methods (high throughput) for discovering PPIs:

• Yeast-two-hybrid.

• AP-MS.

PPIs databases: DIP, Biogrid, Intact, HPRD…

Computational prediction of PPIs

• Genomics methods

• Biological context methods

• Integrative methods

• STRING (EMBL)

Why are PPIs so important?

Gene is the basic unit of heredity. Genomes are availabe.

genome proteome interactome

Proteins, the working molecules of a cell, carry out many biological activities

Proteins function by interacting with other proteins, DNA, RNA, small molecules.

YSearch for drug

molecules:

The body produces a list of proteins: P1, P2, P3,… PN. A pathogen (virus or bacteria) enters the body and produces its own protein, say X.

X interacts with one of proteins, say P1, inhibiting it from its routine activities.Diseases emerge

Introduce into the body a new molecule, Y such that X is more attracted to Y than to P1, freeing P1 to get back to routine work.

Search for drug molecules:

Bring out an effective drug

into the market could: Take 10-15 years

Cost up to US$800 million

Test up to 30,000 candidate

molecules

Databases of molecules interactions or linkages could help to

cut down the search for drug molecules.

The types of PPIs

Binary (physical) interactions: refer to the binding

between two proteins whose residues are in contact at

some point in time.

Funtional linkages: implicate pairwise relationships

between proteins that work together (participate in a

common structural complex or pathway) to implement

biological tasks.

Protein physical interactomes and functional linkage maps are available for

S. cerevisiae (Uetz et al. 2000; Ito et al. 2001; Ho et al. 2002; Gavin

et al. 2002, 2006; Krogan et al. 2006, Tarassov et al. 2008; Yu et al.

E. coli (Butland et al. 2005; Arifuzzaman et al. 2006)

C. elegans (Li et al. 2004)

D. melanogaster (Giot et al. 2003)

Humans (Rual et al. 2005; Stelzl et al. 2005, Ewing et al. 2007)

High throughput experimental methods for discovering PPIs

Yeast-two-hybrid (Y2H)

Ito T. et al., 2001; Uetz P. et al., 2000; Yu H. et al., 2008

Rual et al. 2005; Stelzl et al. 2005

Affinity purification followed by mass spectrometry (AP-

Gavin AC et al., 2002, 2006

Ho Y. et al., 2002

Krogan NJ et al., 2006

Y2H experiments

Idea: Use a protein of interest as bait in order to

discover proteins that physically interact with

the bait protein; these are called prey.

A single transcription factor is cut into two

pieces called Binding Domain (BD) and

Activation Domain (AD). Bait (prey) protein is

fused to the BD (AD).

If bait and prey proteins interact, the

transcription of the reporter gene is initiated.

High throughput screening the interactions

between the bait and the prey library.

AP-MS experiments

Fuse a TAP tag consisting of protein A and

calmodulin binding peptide separated by TEV

protease cleavage site to the target protein

After the first AP step using an IgG matrix,

many contaminants are eliminated.

In the second AP step, CBP binds tightly to

calmodulin coated beads. After washing which

removes remained contaminants and the TEV

protease, the bound meterial is released under

mild condition with EGTA.

Proteins are identified by mass spectrometry

Data output by MS is lists including bait

protein and its co-purified partners

(preys); each accompanied by a

reliability score.

Use a scoring system combining

spokes and matrix models to generate

a network of binary PPIs. Each

interaction has a confidence score

Eliminate low scoring links to obtain

high confident network.

The network is partitioned into densely

connected regions, which are named

complexes.

AP-MS experiments

Computational methods of prediction

Comparative Genomic methods

Gene neighbourhood

Gene fusion

Domain-based method

Phylogenetic

Intergrative methods

Biological context methods

Co-expression

Text mining

Gene neighbourhood based method

Protein a and b whose genes are close in different genomes are predicted

to interact.

Dandekar, T. et al. (1998). Conservation of gene order: A fingerprint of proteins that physically interact. Trends in Biochemical Sciences, 23(9), 324–328

Gene fusion (Rosetta stone)

Protein a and b are predicted to interact if they combine (fuse) to form one

protein in another organism.

Enright, A. Jet al. (1999). Protein interaction maps for complete genomes based ongene fusion events. Nature, 402(6757), 86–90.

Domain based methods

Well-known experimental

PPIs data

Inferred domain-domain interactions (DDIs) Interact/Non-interact

Protein BProtein A

AS, MLE, PE

AS: association; MLE: Maximum Likehood Estimation; PE: Parsimony Explanation

Validation of inferred DDIs remains difficult due to lack of sufficient

and unbias benchmark datasets.

The methods show limited performance at predicting PPIs.H.X. Ta, L. Holm, Biochem. Biophys. Res. Commun. (2009)

Phylogeny based methods

Protein a and c are predicted to interact if they have similar phylogenetic

profiles.

Pellegrini, M. et al. (1999). Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. PNAS, 96(8), 4285–4288

Biological context methods

Gene expression: Two protein whose genes exhibit very similar

patterns of expression across multiple states or experiments

may then be considered candidates for functional association

and possibly direct physical interaction.

GO (Gene Ontology) annotations: two interacting proteins likely

have the same GO term annotations.

Text-mining: Extract interacting protein information from

literature (PubMed..): ”is protein K mentioned with protein I in

publications”

The techniques are used to validate PPIs discovered by other

approaches or are integrated with others in integrative approaches.

Integrative methods

Naive bayes

Random Forest

Decision Tree

Kernels

Logistic Regression

Support Vector Machines

Jansen R. et al., Science 2003

Bader J.S. et al., Nat Biotech 2004

Lin N. et al., BMC Bioinformatics 2004

Zhang L. et al., BMC Bioinformatics 2004

Databases of PPIs DIP(http://dip.doe-mbi.ucla.edu)

71,275 interactions 23,200 proteins 372 organisms

BioGRID (http://www.thebiogrid.org) 247,366 non-redundant interactions 31,254 unique proteins 17 organisms

IntAct (http://www.ebi.ac.uk/intact) 232,793 interactions 69,335 proteins

MINT (http://mint.bio.uniroma2.it) 89,956 interactions 31,631 proteins

SGD (http://www.yeastgenome.org)Saccharomyces Genome Database

HPRD (http://www.hprd.org/) 39,194 interactions 30,047proteins

MIPs: interactions, complexes

STRING: Known and Predicted Protein-Protein Interactions

Protein function Protein-protein relationship Evolution of protein-protein interaction The network of interacting proteins Unknown protein-protein interaction The best interaction conditions

DIP-Searching information

Find information about your protein

DIP Node (DIP:1143N)

Graph of PPIs around DIP:1143N

Nodes are proteins

Edges are PPIs

The center node is DIP:1143N

Edge width encodes the number

of independent experiments

identyfying the interaction.

Green (red) is used to draw core

(unverified) interactions.

Click on each node (edge) to

know more about the protein

(interaction).

List of interacting partners of DIP:1143N

STRING: Search Tool for the Retrieval of Interacting Genes/Proteins

A database of known and predicted protein interactions Direct (physical) and indirect (functional) associations The database currently covers 2,590,259 proteins from 630

organisms Derived from these sources:

Supported by

Searching information

Query infomation via protein names or protein sequences.

Graph of PPIs

Nodes are proteins

Lines with color is an evidence of

interaction between two proteins.

The color encodes the method

used to detect the interaction.

Click on each node to get the

information of the corresponding

protein.

Click on each edge to get

information of the interaction

between two proteins.

List of predicted partners

Partners with discription and confidence score. Choose different types of views to see more detail

Neighborhood View

The red block is the queried protein and others are its neighbors in organisms. Click on the blocks to obtain the information about corresponding proteins.

The close organisms show the similar protein neighborhood patterns. Help to find out the close genes/proteins in genomic region.

Occurence Views

Represents phylogenetic profiles of proteins. Color of the boxes indicates the sequence similarity between the proteins and

their homologus protein in the organisms. The size of box shows how many members in the family representing the

reported sequence similarity. Click on each box to see the sequence alignment.

Gene Fusion View

This view shows the individual gene fusion events per species Two different colored boxes next to each other indicate a fusion

event. Hovering above a region in a gene gives the gene name; clicking on

a gene gives more detailed information

References Skrabanek L, Saini HK, Bader GD, Enright AJ. Computational prediction of

protein-protein interactions. Methods Mol Biol. 2004;261:445-68 Benjamin A. Shoemaker, Anna R. Panchenko. Deciphering Protein–Protein

Interactions. Part I. Experimental Techniques and Databases. PLoS Comput Biol 3(3): e42. doi:10.1371/journal.pcbi.0030042

Benjamin A. Shoemaker, Anna R. Panchenko. Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners. PLoS Comput Biol 3(4): e43. doi:10.1371/journal.pcbi.0030043

Pitre S, Alamgir M, Green JR, Dumontier M, Dehne F, Golshani A. Computational methods for predicting protein-protein interactions. Adv Biochem Eng Biotechnol. 2008;110:247-67.

Wodak SJ, Pu S, Vlasblom J, Séraphin B. Challenges and rewards of interaction proteomics. Mol Cell Proteomics. 2009 Jan;8(1):3-18

Yanjun Qi, Ziv Bar-joseph, Judith Klein-seetharaman. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. PROTEINS: Structure, Function, and Bioinformatics. 63(3):490-500

Why protein-protein interactions (PPI)?

PPIs are involved in many biological processes: Signal transduction

Protein complexes or molecular machinery.

Protein carrier.

Protein modifications (phosphorylation)

PPIs help to decipher the molecular mechanisms

underlying the biological functions, and enhance the

approaches for drug discovery

Assessment of large–scale datasets of PPIs

Yu H, et al. (2008). Science 322: 104-110

Benchmarking high-throughput interactions: Y2H: Uetz et al. 2000; Ito et al. 2001

AP-MS: Gavin et al. 2006; Krogan et al. 2006

Binary gold standard (GS): positive reference set (PRS)

and random reference set (RRS).

MIPs co-complex gold standard.

Measure large-scale datasets against Binary-GS and

MIPs-GS

Assessment of large–scale datasets of PPIs

Yu H, et al. (2008). Science 322: 104-110

AP/MS performs well at detecting co-complex associations according to

Y2H performs well at detecting binary interactions according to Binary-GS

protein-protein interactions protein analysis workshop 2010 bioinformatics group institute of...

bait protein

bait prey protein

target protein

prey proteins

drug molecules

protein physical interactomes

list of proteins

ppis yeast

Documents

lecturer: dos. vesa hänninen vesa.hanninen@helsinki.fi...

30/1 william hung 30/1 name: william hung (hung hang cheung)...

series 680 single hung/tilt • series 690 double hung/tilt

double-hung windows · 2020-03-18 · double hung |...

analysis of large groups of genes petri toronen...

protein-protein interaction of soy protein … · dr....

judy + hung

double-hung windows · 2020-05-07 · double hung |...

hung society

hung remakingbeijingch1

far east and back fr: hung-ch i to: hung ch i · title: far...

adaptive user interface modelling for web-environments...

jonna kangasoja city of helsinki jonna.kangasoja@helsinki.fi...

grande wall hung toilet toilets / wall hung wh - p -...

wall hung radiator heating systems manifolds • controls...

labview instrumentoinnissa, 55492, 3op labview in...

premium double hung - arcat · premium double hung series...

nguyen hung cuong - works of hung cuong origami

timo.paivarinta@helsinki.fi timo.paivarinta@helsinki.fi...

double hung