bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · cb cg h h o=c h n ca...

18
1 Bioinformatics 2 -- lecture 12 Sidechain rotamers Dead End Elimination Theorem Protein Design Stories Short fast history of protein design Site-directed mutagenesis -- protein engineering (Wells, 1980's) Coiled coils, helix bundles (DeGrado, 1980's-90's) DEE -- protein stabilization (Mayo, 1990's) Binding pocket design (Hellinga, 2000) Interface design, new fold design (Kuhlman, 2002-4) Experimental (non-computational) approaches: in vitro evolution • phage display **Other names in protein design: Hill, Gray, Vriend, Regan, Baker, Richardson, Dunbrack, Choma, many more.

Upload: others

Post on 21-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

1

Bioinformatics 2 -- lecture 12

Sidechain rotamers

Dead End Elimination Theorem

Protein Design Stories

Sidechain rotamers

Dead End Elimination Theorem

Protein Design Stories

Short fast history of protein design

Site-directed mutagenesis -- protein engineering (Wells, 1980's)

Coiled coils, helix bundles (DeGrado, 1980's-90's)

DEE -- protein stabilization (Mayo, 1990's)

Binding pocket design (Hellinga, 2000)

Interface design, new fold design (Kuhlman, 2002-4)

Experimental (non-computational) approaches:

• in vitro evolution• phage display

**Other names in protein design: Hill, Gray, Vriend, Regan, Baker,Richardson, Dunbrack, Choma, many more.

Page 2: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

2

Proteins can be made super-stable

[GdnHCl]

Folded

natural seq

designed seq

8M

Malakauskas SM and Mayo SL (1998) Nature Struct. Biol., 5, 470. Design,Structure, and Stability of a Hyperthermophilic Protein Variant.

some amazing accomplishments in protein design

Distinct conformational states can be stabilized.

aMb2 integrin I domain in 2 conformations2 crystal structures are known. They differ in the highlighted region.Shimaoka et al designed sequences for each form, open and closed. The two designswere shown to have different physiological properties.

Shimaoka, M., Shifman, J. M., Takagi, J., Mayo, S. L., Springer, T. A. (2000)Computational design of an integrin I domain stabilized in the high affinity conformation.Nature Struc. Biol. 7(8), 674-678

Page 3: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

3

New folds can be designed

Kuhlman et al.Science, v.302(5649), 1364-1368 (2003)

New proteins can be designed that have never been seenbefore. The designs are accurate (compare red and blueabove) and they are highly stable.

Re-designed proteins consistently stable andprotein-like

Dantas et al., J. Mol. Biol. (2003) 332, 449–460

Page 4: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

4

New binding sites can be designed

Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W.Nature 423, 185–190 (2003).

Used to bindarabinose, now itbinds seratonin.

The goal of sequence design

Given a desired structure, find an amino acid sequence thatfolds to that structure.

MIKYGTKIYRINSDNSGKJHGCKAHNEEEGHA

design folding

To do this, we must assign an energy to each possiblesequence.

Page 5: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

5

Sidechain modeling

Given a backbone conformation and the sequence,can we predict the sidechain conformations?

HINT: If we can't do this, we certainly can't do sequencedesign.

before we do that, we have to do this

Energy calculations are sensitive to small changes. Sothe wrong sidechain conformation will give the wrongenergy.

Goal of sidechain modeling

Desmet et al, Nature v.356, pp339-342 (1992)

Given the sequence andonly the backbone atomcoordinates, accuratelymodel the positions of thesidechains.

fine lines = true structurethink lines = sidechain predictions using the method of Desmet et al.

Page 6: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

6

Sidechain space is descrete,almost

A random sampling of Phenylalanine sidechains, when

superimposed, fall into three classes: rotamers.

This simplifies the problem of sidechain modeling.All we have to do is select the right rotamers and we're close to theright answer.

Rotamer

Page 7: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

7

What determines rotamers

CG

H

H

HO=C

N

CA

CB

CG

H H

HO=C

N

CA

CB

CG

H

H

HO=C

N

CA

CB

"m" "p""t"-60° gauche 180° anti/trans +60° gauche

3-bond or 1-4 interactions define the preferred angles, but these maydiffer greatly in energy depending on the atom groups involved.

Exercise: rotamers of WOpen InsightII

Create a tripeptide TWV, as follows:

Open the Builder menu

Fragment-->GetSelect fragment library "amino acid"

Fragment-->Get-->threoninecancel Bond window

Fragment-->Get-->tryptophanecancel Bond window

Fragment-->Get-->valinecancel Bond window

Using F10 (connect object), align the three objects as in thefollowing slide.

Page 8: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

8

Exercise: twisting WLink the three amino acids to make a tripeptide withtrans peptide bonds. To make the peptide (C--N) bond,select the hydrogens that will be lost.

In Builder: Modify-->Bond-->{partial double}Select Hydrogens as circled above, and execute.

select these H's

Exercise: twisting W

Now, select the chi1 and chi2 angles to rotate, usingTransform-->torsion-->{add,explicit,4-atom}

Select N, CA, CB, CG, CD1Use F7 and middle mouse to select angles.

select these atoms

1

2

3

45It shouldlook likethis now:

Page 9: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

9

Exercise: twisting W

To show the surface use,Molecule-->Render-->{cpk, high}

NOTE: high resolution is OK because the model is small.Set the W sidechain to each rotamer position and describe

the location of the 6-membered ring: touching the V,exposed, etc etc..

W sidechain isshown here lyingover Thrbackbone

Rotamers of W*:p-90 +60 -90p90 +60 +90t-105 180 -105t90 180 90m0 -65 5m95 -65 95

Rotamer Libraries

Rotamer libraries have been compiled by clustering thesidechains of each amino acid over the whole database. Eachcluster is a representative conformation (or rotamer), and isrepresented in the library by the best sidechain angles (chiangles), the "centroid" angles, for that cluster.

Two commonly used rotamer libraries:

*Jane & David Richardson:http://kinemage.biochem.duke.edu/databases/rotamer.php

Roland Dunbrack: http://dunbrack.fccc.edu/bbdep/index.php

*rotamers of W on the previous page are from the Richardson library.

Page 10: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

10

Dead end elimination theorem

•There is a global minimum energy conformation (GMEC),where each residue has a unique rotamer.

In other words: GMEC is the set of rotamers that has thelowest energy.

•Energy is a pairwise thing. Total energy can be broken downinto pairwise interactions. Each atom is either fixed (backbone)or movable (sidechain).

fixed-movable movable-movable fixed-fixed

E is a constant,=Etemplate

E depends on rotamer,but independent of

other rotamers

E depends on rotamer,and depends on

surrounding rotamers

Theoretical complexity ofsidechain modeling

The Global Minimum Energy Configuration (GMEC) is one,unique set of rotamers.

How many possible sets of rotamers are there?

n1 n2 n3 n4 n5 … nL

where n1 is the number of rotamers for residue 1, and so on.

Estimated complexity for a protein of 100 residue, with anaverage of 5 rotamers per position: 5100 = 8*1069

DEE reduces the complexity of the problem from 5L toapproximately (5L)2

Page 11: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

11

Dead end elimination theorem•Each residue is numbered (i or j) and each residue has a set ofrotamers (r, s or t). So, the notation ir means "choose rotamer rfor position i".

•The total energy is the sum of the three components:

NOTE: Eglobal ≥ EGMEC for any choice of rotamers.

Eglobal = Etemplate + ΣiE(ir) + ΣiΣjE(ir,js)

where r and s are any choice of rotamers.

fixed-fixedfixed-movable

movable-movable

Dead end elimination theorem•If ig is in the GMEC and it is not, then we can separate theterms that contain ig or it and re-write the inequality.

E(ir) + Σj mins E(irjs) > E(it) + Σj maxs E(it,js)

EGMEC = Etemplate + E(ig) + ΣjE(ig,jg) + ΣjE(jg) + ΣjΣkE(jg,kg)

EnotGMEC = Etemplate + E(it) + ΣjE(it,jg) + ΣjE(jg) + ΣjΣkE(jg,kg)

...is less than...

E(ir) + Σj E(irjs) > E(ig) + Σj E(ig,js)Canceling all terms in black, we get:

So, if we find two rotamers ir and it, and:

Then ir cannot possibly be in the GMEC.

Page 12: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

12

Dead end elimination theorem

E(ir) + Σj mins E(irjs) > E(it) + Σj maxs E(it,js)

This can be translated into plain English as follows:

If the "worst case scenario" for t is better than the"best case scenario" for r, then you always choose t.

Exercise: Dead End Elimination

Using the DEE worksheet:

(1) Find a rotamer that satisfies the DEE theorem.

(2) Eliminate it.

(3) Repeat until each residue has only one rotamer.

What is the final GMEC energy?

Page 13: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

13

DEE exercise

abc

1

2

3

Three sidechains. Each with three rotamers. Therefore, there are3x3x3=27 ways to arrange the sidechains. • Each rotamer has anenergy E(r), which is the non-bonded energy between sidechain andtemplate. • Each pair of rotamers has an interaction energy E(r1,r2),which is the non-bonded energy between sidechains.

DEE exercise

-1 1 1

3 5 1

5 5 -1

-2 2 5

0 5 -1

0 0 0

0 0 1

12 5 0

4 3 0

-1 3 5

1 5 5

1 1 -1

-2 0 0

2 5 0

5 -1 0

0 12 4

0 5 3

1 0 0

r1

r2 E(r1,r2)

1

2

3

21 3

abc

abc

abc

a b ca b c a b c

0 0 5 0 0 0 0 0 10

0

0

5

0

0

0

0

0

10

E(r2)

E(r1)

Page 14: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

14

DEE exercise: instructions

(1) The best (worst) energies are found using the worksheet:Add E(r1) to the sum of the lowest (highest) E(r1,r2) that havenot been previously eliminated.

(2) There are 9 possible DEE comparisons to make: 1a versus

1b, 1a versus 1c, 1b versus 1c, 2a versus 2b, etc. etc. For eachcomparison, find the minimum and maximum energychoices of the other rotamers. If the maximum energy of r1is less than the minimum energy of r2, eliminate r2.

(3) Scratch out the eliminated rotamer and repeat until onerotamer per position remains.

If the “best case scenario” for r1 is worse than the “worst casescenario” for r2 you can eliminate r1.

Sequence design using DEE•Selected residues (or all) are chosen for mutating.

•Selected (or all) amino acids are allowed at thosepositions.

•For the selected amino acids, all rotamers areconsidered.

Now "rotamer" comes to mean the amino acid identityand its conformation.

Since there are as many as 193 rotamers in therotamer library for all amino acids, each selectedposition can have as many as 193 "rotamers."

If "fine grained" rotamers are used, this number maybe much larger.

Page 15: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

15

Theoretical complexity ofsequence design

To design THE OPTIMAL sequence, we need the best aminoacid, and its best rotamer at every position. We can treat eachposition as one of 193 possible rotamers. That's 191 rotamersin the Richardson library, plus Gly and Ala (which have norotamers)

How many possible sets of rotamers are there for a protein oflength 100?

193100 = 3.6*10228

DEE reduces the complexity of sequence design to about(193L)2 = 3.6*108

Sequence space maps to structure space

..as many-to-one.

This means that there is a lot of potential for "slop" in a sequencedesign. Moderately big sequence changes are possible, and thesequence can still fold to the same general structure.

sequencefamilies

fold

Good news for protein designers

Page 16: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

16

Re-designing a binding siteThe group of Homme Hellinga (Duke Univ) has usedDEE to redesign the shape of a small moleculebinding site. the site originally bound the sugararabinose. It was redesigned to bind trinitrotoluene,seratonin, and L-aspartate.

How did they do it??

Recent success of sequence design

An appropriate binding site was found

Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W.Nature 423, 185–190 (2003).

The native ligand(arabinose) isapproximately thesame size as thetargeted ligand(seratonin).

Page 17: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

17

A space was carved out for the ligand

Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W.Nature 423, 185–190 (2003).

All sidechains in thebinding site weretruncated to alanines,and a space was defined(yellow) for the newligand. Lots of possibleligand orientations weremade. Ligandorientations weretreated like rotamersin DEE!

A good energy function makes a good design

Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W.Nature 423, 185–190 (2003).

The most criticalcomponent of theenergy functionwas hydrogenbonding (dottedlines). Everydonor/acceptorshould besatisfied.

Page 18: Bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · CB CG H H O=C H N CA CB CG H O=C H N CA "m" "t" "p"-60° gauche 180° anti/trans +60° gauche 3-bond or

18

The End

Please use the remaining time to work on your term projects:

You should be : FINALIZING THE ALIGNMENT,ENERGY MINIMIZING, and adding ligands if necessary.