coiled coils: a tractable problem for bioinformatics

Post on 12-Sep-2021

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Coiled Coils: A Tractable Problem for Bioinformatics

Vincent Waldman

Indiana UniversityJanuary 30, 2008

Helical Net Representation of Coiled Coil Interface

abcdefgabcdefgabcdefgabcdefg

Left-Handed Super Helical Twist

2FXO

Myosin

A Schematic View of Coiled Coils

Knobs into Holes

PDB: 1YSA

Periodic, Simple right? GCN4

1YSA

bcdefgabcdefgabcdefgabcdefgabc

KQLEDKVEELLSKNYHLENEVARLKKLVGE

Oligomeric State

PDB: 2HY6

PDB: 1GCLPDB: 2ZTA PDB: 1GCMN

GCN4-p1 a and d Ile a Ilee and g Ala

N

Relative Orientation

PDB: 2ZTA

N

GCN4-p1

NPDB: 1SER

Residues 530-597

N C

SRS

1AQ5

Cartilage Matrix Protein

Variety of Coiled Coils

2TMA

Tropomyosin

1SER

SRS

1T98

MukF

2NPS

SNAREs

2FXO

Myosin

GCN4

1YSA

1HTM

Influenza Hemagglutinin

2SPC

SpectrinHepatidis

Delta Antigen

1A92

Tsr

1QU7

An Important Example of Unknown Coiled Coil Structure

SMCBSMukBEC

Coiled Coil Algorithms• SOCKET

– Walshaw and Woolfson (2001) JMB, 1437• Coils

– Lupas et al (1991) Science, 1162• Paircoil

– Berger et al (1995) PNAS, 8259

• A Genome wide search for coiled coils that uses a version of Paircoil

• Biochemistry that has allowed programs to be designed that predict potential interacting coiled coil pairs

SOCKET: Automatic Identification of Coiled Coils in Structures

GCN4

1YSA

bcdefgabcdefgabcdefgabcdefgabc

KQLEDKVEELLSKNYHLENEVARLKKLVGE

Tsr

1QU7

SOCKET

•SOCKET is a program that unambiguously identifies coiled coil motifs in protein structure and assigns register positions

•This program makes assignments based on structural features as opposed to sequence

Representing Side Chains

Walshaw and Woolfson (2001) JMB, 1437

Knobs Into Holes

Walshaw and Woolfson (2001) JMB, 1437

Types of Knobs into Holes

Walshaw and Woolfson (2001) JMB, 1437

Assigning Register With Complementary Knobs Into Holes

i

i+3i+4

i+7

i+3 i+4

i

i+7

kn = hn, 2= d position

If kn = hn, 3 would be an a position

Walshaw and Woolfson (2001) JMB, 1437abcdefgabcdefgabcdefgabcdefgabc

N-term Top Hole

C-term Bottom Hole

Complementary Knobs into Holes in Trimers and Higher Order

Structures

Walshaw and Woolfson (2001) JMB, 1437

Non-Coiled Coil Knobs Into Holes

Walshaw and Woolfson (2001) JMB, 1437

Minimum Layers in Higher Order Structures

Walshaw and Woolfson (2001) JMB, 1437

Coiled Coil Knobs Into Holes

Walshaw and Woolfson (2001) JMB, 1437

SOCKET Recap• Side chains represented as mean of coordinates

• A residue is considered a knob if it touches 4 or more residues under a specified cutoff

• Holes are defined as the four closest residues to a knob

• Register is assigned through complementary Knobs into Holes

To Use SOCKET

• Walshaw, J. & Woolfson, D.N. (2001), SOCKET: A Program for Identifying and Analysing Coiled-coil Motifs Within Protein Structures, J. Mol. Biol., 307 (5), 1427-1450

• http://www.lifesci.sussex.ac.uk/research/woolfson/html/coiledcoils/

– Requires• PDB-format file with 3D coordinates

GCN4

1YSA

COILS• COILS– is a program that predicts the probability

of coiled coil formation from protein sequence

• COILS compares protein sequence to a database of parallel two-stranded coiled coils

• The comparison generates a similarity score• The score is compared to score distributions of

globular and coiled coil proteins to generate a probability of coiled coil formation

2TMA

GenBank db• ~2,000,000 residues

Random Generated Sequence db• ~52,200 residues

Database AssemblyGlobular db

– All non-redundant non-cc proteins in pdb

• 150 proteins• ~32,600 residues

CC db– Tropomyosin– Myosin– Keratins

• All parallel dimers

• Extracted from Genbank

• ~17,500 Residues

Tropomyosin

2FXO

Myosin

Tabulate Relative Frequencies of Occurrence

Normalized Probabilities = νk(A) = [fk(A)/Tk] / WA

Lupas et al (1991) Science, 1162

Sliding Window To Calculate Residue Score

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI

Lupas et al (1991) Science, 1162

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI

Window 1

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30

a b c d e f g g a b

(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1

POI

Register 1

Window 1

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

Frequency Values

Lupas et al (1991) Science, 1162

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30

a b c d e f g g a b

(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1

b c d e f g a a b c

(νb* νc* νd* νe* νf* νg* νa* ... νa)1/28 = Sk2

POI

Register 1

Register 2

Window 1

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30

a b c d e f g g a b

(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1

b c d e f g a a b c

(νb* νc* νd* νe* νf* νg* νa* ... νa)1/28 = Sk2

g a b a b c d d e f

(νg* νa* νb* νa* νb* νc* νd* ... νd)1/28 = Sk7

POI

Register 1

Register 2

Register 7

Window 1

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI

Window 2

For each residue k there are

•28 possible windows

•With 7 registers each

Sk is the highest of all of these 196 possible score

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

Steps to Calculating Residue Score

i. Assign Window ii. Assign Heptad Register to Windowiii. Assign Each Residue in Window the

Corresponding Relative Frequency iv. Take Geometric Meanv. Assign New Heptad Register and repeat iii

and ivvi. Assign New Window and Repeat ii-vvii. Take Largest Score

Lupas et al (1991) Science, 1162

Gaussian Distribution of CC Scores

• Scores of GenBank and Random Sequences were performed for comparison and scaling purposes

Lupas et al (1991) Science, 1162

Probability a Residue is CC

P(S)=GCC(S)/[30Gg(S)+ GCC(S)]

Lupas et al (1991) Science, 1162

Coils Output For MukB

http://www.ch.embnet.org/software/COILS_form.html

Coils Recap• Score Generated for each Residue of a

Sequence Based on Relative Frequencies

• Scores interpreted as a probability for Forming A Coiled Coil

• Limitations– Trained on parallel dimers– Biased towards hydrophillic charge rich sequences

Coils Options

• Scoring Options– MTK matrix, as described– MTIDK matrix, weights residue frequency

according to CC families

• Weighting a and d positions

• Window sizes 14, 21, 28

To Use Coils• Lupas, A., Van Dyke, M., and Stock, J. (1991)

Predicting Coled Coils from Protein Sequences,Science 252:1162-1164.

• http://www.ch.embnet.org/software/COILS_form.html

• Requirements– Almost any standard format of protein sequence can be

used as an input

Coils Spin Off

PCOILS: most recent version of Coils– Trained on a larger data set– More options for matrices– Uses BLAST to calculate probabilities on alignments

– Is much slower than Coils

To use is part of the REPPER server– http://toolkit.tuebingen.mpg.de/pcoils

PAIRCOIL

• PAIRCOIL- like COILS is a program that predicts the probability of coiled coil formation from protein sequence

• PAIRCOILS differs from COILS in that it uses pairwise interactions to determine similarity scores instead of relative occurrences in register positions

Database AssemblyCC-database

– Myosin– Tropomyosin– Intermediate filaments

• From Genpept• ~58,200 residues

PDB-minus database– PDB database with known

coiled coils removed– The multiple alignment

program pileup reduced protein sequences to 286 classes

– One representative structure from each class was used

– 63,100 residues

PIR-minus database •PIR database with the myosins, tropomyosins, and IF proteins removed•~7,300,000 residues

Computing Probabilities

n1 n2 n3 n4 k5 n6 n7 n8 n9 n10 n11

a b c d e f g a b c d

k-4 k-3 k-2 k-1 k k+1 K+2 K+3 K+4 K+5 K+5…

POI

Register

Representation

In this register

k=e

k+1 = f

K+2 = g

Computing Normalized Probabilities

• Single occurrence frequencies are computed as they were in COILS

– Relative Frequency of Occurrence νk(A)=[fk(A)/Tk] / WAor

νk+i(B)=[fk+i(B)/Tk+i] / WB

• Correlations Occurrence Frequencies

νk,k+1(A,B)=[fk,k+1(A,B)/Tk,k+1] / WAB-i

Berger et al (1995) PNAS, 8259

Tabulating Pairwise Correlations

( )( ) ( )BA

BAAP

ikk

ikkikk

+

++ =

ννν ,

ln)( ,,

Berger et al (1995) PNAS, 8259

Tabulating Pairwise Correlations

Berger et al (1995) PNAS, 8259

Color Coding Pairwise Correlations

k, k+7

k+2

k+3

k+4

k+5

k+6

k+1

Berger et al (1995) PNAS, 8259

Using Correlation Probabilities to Predict Coiled Coil

( ) ( ) ( )( ) ( ) ( )DCB

DACABAAP

kkk

kkkkkkk

421

4,2,1, ,,,ln

31)(

+++

+++=ννν

ννν

Berger et al (1995) PNAS, 8259

Calculating Residue Score

1. Set a Sliding Window of 30-residues to include residue k

2. Sum all tripartite correlation probabilities for each residue in the window for each register

3. Shift Window and repeat 24. Repeat steps 2 and 3 until all possible

windows and registers have been scored5. Residue score is highest value for all possible

window over and registers

Comparison of COILS to PAIRCOIL

PDB-minus database for non-coiled coils

Berger et al (1995) PNAS, 8259

Comparison of COILS to PAIRCOIL

PIR-minus database for non-coiled coils

Berger et al (1995) PNAS, 8259

Using Correlation Score to predict Probability

Berger et al (1995) PNAS, 8259

MultiCoil a PairCoil Spin Off

Green: PDB-minus ~39,000 residues

Red: 2-strand DB ~58,200 residues

Blue: 3-strand DB ~6,300 residues

•Distinguishes between globular proteins, two-stranded CC’s and Three-stranded CC’s

Wolf et al (1997) Protein Science, 1179

To use Paircoil

• Paircoil:– http://groups.csail.mit.edu/cb/paircoil/cgi-

bin/paircoil.cgi

• Multicoil– http://groups.csail.mit.edu/cb/multicoil/cgi-

bin/multicoil.cgi

• Paircoil2– http://groups.csail.mit.edu/cb/paircoil2/paircoil2.html

Output Comparisons

Coils Paircoil

Multicoil Output

A Computationally Directed Screen Identifying Interacting CCs in Yeast

– Protein-interaction motifs are often identified with computational methods but potential ligands are not

– A potential ligand for a CC is a CC

Newman et al (2000) PNAS, 13203

Using Multicoil to Identify Potential Pairing Partners

•~6,000 ORF in yeast

•~300 two-stranded

•~250 three-stranded

•~1:11 proteins in yeast potential have CCs

•~half of these have no known function

Newman et al (2000) PNAS, 13203

Yeast Two Hybrid Assay

GAR-Y

GDBD-X

CC motifs often work well for X and Y because many times they can fold autonomously

Fields, Song (1989) Nature, 245

GAR-Y

GDBD-X

GAR-Y

GDBD-X

162 x 162 = 26244 possible combinations (about half redundant)

Identified 213 interactions

GAR-Y

GDBD-X

162 x 162 = 26244 possible combinations (about half redundant)

Identified 213 interactions

Blow Up of Yeast Two Hybrid Results

GAR-Y

GDBD-X

Newman et al (2000) PNAS, 13203

Limitations to this Method

• Protein-Protein Interactions may require non-CC contacts in addition to CC contacts. – This study only identified 6 of the 25 known

interactions assayed from the yeast genome

• Some CC types not easily detected with Y2H– Parallel homodimeric constructs not easily detected

• False Positives possible but arguably not likely

Human bZips a Testing Ground For Software that Predicts Protein Protein Interactions

~53 Unique Human bZips

~1131 Unique Heterodimers

Newman, Keating (2003) Science, 2097

Current Generation of CC programs

• Aim to predict novel interacting CC partners

• Two notable studies attempt to describe bZip proteins with considerable success

1. Weights potential interstrand interactions based on many biophysical studies– Fong JA, Keating AE, Singh M (2004) Genome Biology,

5:R11

2. Physical modeling– Grigoryan G, Keating AE (2006) JMB, 355, 1125

Coiled Coil Recap

SOCKET: Identifies and assigns coiled coil registers to structures

Coils, Paircoil, Multicoil and progeny: predict the likelihood that a particular sequence belongs to a CC

Programs are beginning to be able to predict potential coiled coil binding partners

top related