bioinformatics 2 -- lecture 12bystrc/courses/biol4550/lecture12/lec12.pdf · cb cg h h o=c h n ca...
TRANSCRIPT
1
Bioinformatics 2 -- lecture 12
Sidechain rotamers
Dead End Elimination Theorem
Protein Design Stories
Sidechain rotamers
Dead End Elimination Theorem
Protein Design Stories
Short fast history of protein design
Site-directed mutagenesis -- protein engineering (Wells, 1980's)
Coiled coils, helix bundles (DeGrado, 1980's-90's)
DEE -- protein stabilization (Mayo, 1990's)
Binding pocket design (Hellinga, 2000)
Interface design, new fold design (Kuhlman, 2002-4)
Experimental (non-computational) approaches:
• in vitro evolution• phage display
**Other names in protein design: Hill, Gray, Vriend, Regan, Baker,Richardson, Dunbrack, Choma, many more.
2
Proteins can be made super-stable
[GdnHCl]
Folded
natural seq
designed seq
8M
Malakauskas SM and Mayo SL (1998) Nature Struct. Biol., 5, 470. Design,Structure, and Stability of a Hyperthermophilic Protein Variant.
some amazing accomplishments in protein design
Distinct conformational states can be stabilized.
aMb2 integrin I domain in 2 conformations2 crystal structures are known. They differ in the highlighted region.Shimaoka et al designed sequences for each form, open and closed. The two designswere shown to have different physiological properties.
Shimaoka, M., Shifman, J. M., Takagi, J., Mayo, S. L., Springer, T. A. (2000)Computational design of an integrin I domain stabilized in the high affinity conformation.Nature Struc. Biol. 7(8), 674-678
3
New folds can be designed
Kuhlman et al.Science, v.302(5649), 1364-1368 (2003)
New proteins can be designed that have never been seenbefore. The designs are accurate (compare red and blueabove) and they are highly stable.
Re-designed proteins consistently stable andprotein-like
Dantas et al., J. Mol. Biol. (2003) 332, 449–460
4
New binding sites can be designed
Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W.Nature 423, 185–190 (2003).
Used to bindarabinose, now itbinds seratonin.
The goal of sequence design
Given a desired structure, find an amino acid sequence thatfolds to that structure.
MIKYGTKIYRINSDNSGKJHGCKAHNEEEGHA
design folding
To do this, we must assign an energy to each possiblesequence.
5
Sidechain modeling
Given a backbone conformation and the sequence,can we predict the sidechain conformations?
HINT: If we can't do this, we certainly can't do sequencedesign.
before we do that, we have to do this
Energy calculations are sensitive to small changes. Sothe wrong sidechain conformation will give the wrongenergy.
≠
Goal of sidechain modeling
Desmet et al, Nature v.356, pp339-342 (1992)
Given the sequence andonly the backbone atomcoordinates, accuratelymodel the positions of thesidechains.
fine lines = true structurethink lines = sidechain predictions using the method of Desmet et al.
6
Sidechain space is descrete,almost
A random sampling of Phenylalanine sidechains, when
superimposed, fall into three classes: rotamers.
This simplifies the problem of sidechain modeling.All we have to do is select the right rotamers and we're close to theright answer.
Rotamer
7
What determines rotamers
CG
H
H
HO=C
N
CA
CB
CG
H H
HO=C
N
CA
CB
CG
H
H
HO=C
N
CA
CB
"m" "p""t"-60° gauche 180° anti/trans +60° gauche
3-bond or 1-4 interactions define the preferred angles, but these maydiffer greatly in energy depending on the atom groups involved.
Exercise: rotamers of WOpen InsightII
Create a tripeptide TWV, as follows:
Open the Builder menu
Fragment-->GetSelect fragment library "amino acid"
Fragment-->Get-->threoninecancel Bond window
Fragment-->Get-->tryptophanecancel Bond window
Fragment-->Get-->valinecancel Bond window
Using F10 (connect object), align the three objects as in thefollowing slide.
8
Exercise: twisting WLink the three amino acids to make a tripeptide withtrans peptide bonds. To make the peptide (C--N) bond,select the hydrogens that will be lost.
In Builder: Modify-->Bond-->{partial double}Select Hydrogens as circled above, and execute.
select these H's
Exercise: twisting W
Now, select the chi1 and chi2 angles to rotate, usingTransform-->torsion-->{add,explicit,4-atom}
Select N, CA, CB, CG, CD1Use F7 and middle mouse to select angles.
select these atoms
1
2
3
45It shouldlook likethis now:
9
Exercise: twisting W
To show the surface use,Molecule-->Render-->{cpk, high}
NOTE: high resolution is OK because the model is small.Set the W sidechain to each rotamer position and describe
the location of the 6-membered ring: touching the V,exposed, etc etc..
W sidechain isshown here lyingover Thrbackbone
Rotamers of W*:p-90 +60 -90p90 +60 +90t-105 180 -105t90 180 90m0 -65 5m95 -65 95
Rotamer Libraries
Rotamer libraries have been compiled by clustering thesidechains of each amino acid over the whole database. Eachcluster is a representative conformation (or rotamer), and isrepresented in the library by the best sidechain angles (chiangles), the "centroid" angles, for that cluster.
Two commonly used rotamer libraries:
*Jane & David Richardson:http://kinemage.biochem.duke.edu/databases/rotamer.php
Roland Dunbrack: http://dunbrack.fccc.edu/bbdep/index.php
*rotamers of W on the previous page are from the Richardson library.
10
Dead end elimination theorem
•There is a global minimum energy conformation (GMEC),where each residue has a unique rotamer.
In other words: GMEC is the set of rotamers that has thelowest energy.
•Energy is a pairwise thing. Total energy can be broken downinto pairwise interactions. Each atom is either fixed (backbone)or movable (sidechain).
fixed-movable movable-movable fixed-fixed
E is a constant,=Etemplate
E depends on rotamer,but independent of
other rotamers
E depends on rotamer,and depends on
surrounding rotamers
Theoretical complexity ofsidechain modeling
The Global Minimum Energy Configuration (GMEC) is one,unique set of rotamers.
How many possible sets of rotamers are there?
n1 n2 n3 n4 n5 … nL
where n1 is the number of rotamers for residue 1, and so on.
Estimated complexity for a protein of 100 residue, with anaverage of 5 rotamers per position: 5100 = 8*1069
DEE reduces the complexity of the problem from 5L toapproximately (5L)2
11
Dead end elimination theorem•Each residue is numbered (i or j) and each residue has a set ofrotamers (r, s or t). So, the notation ir means "choose rotamer rfor position i".
•The total energy is the sum of the three components:
NOTE: Eglobal ≥ EGMEC for any choice of rotamers.
Eglobal = Etemplate + ΣiE(ir) + ΣiΣjE(ir,js)
where r and s are any choice of rotamers.
fixed-fixedfixed-movable
movable-movable
Dead end elimination theorem•If ig is in the GMEC and it is not, then we can separate theterms that contain ig or it and re-write the inequality.
E(ir) + Σj mins E(irjs) > E(it) + Σj maxs E(it,js)
EGMEC = Etemplate + E(ig) + ΣjE(ig,jg) + ΣjE(jg) + ΣjΣkE(jg,kg)
EnotGMEC = Etemplate + E(it) + ΣjE(it,jg) + ΣjE(jg) + ΣjΣkE(jg,kg)
...is less than...
E(ir) + Σj E(irjs) > E(ig) + Σj E(ig,js)Canceling all terms in black, we get:
So, if we find two rotamers ir and it, and:
Then ir cannot possibly be in the GMEC.
12
Dead end elimination theorem
E(ir) + Σj mins E(irjs) > E(it) + Σj maxs E(it,js)
This can be translated into plain English as follows:
If the "worst case scenario" for t is better than the"best case scenario" for r, then you always choose t.
Exercise: Dead End Elimination
Using the DEE worksheet:
(1) Find a rotamer that satisfies the DEE theorem.
(2) Eliminate it.
(3) Repeat until each residue has only one rotamer.
What is the final GMEC energy?
13
DEE exercise
abc
1
2
3
Three sidechains. Each with three rotamers. Therefore, there are3x3x3=27 ways to arrange the sidechains. • Each rotamer has anenergy E(r), which is the non-bonded energy between sidechain andtemplate. • Each pair of rotamers has an interaction energy E(r1,r2),which is the non-bonded energy between sidechains.
DEE exercise
-1 1 1
3 5 1
5 5 -1
-2 2 5
0 5 -1
0 0 0
0 0 1
12 5 0
4 3 0
-1 3 5
1 5 5
1 1 -1
-2 0 0
2 5 0
5 -1 0
0 12 4
0 5 3
1 0 0
r1
r2 E(r1,r2)
1
2
3
21 3
abc
abc
abc
a b ca b c a b c
0 0 5 0 0 0 0 0 10
0
0
5
0
0
0
0
0
10
E(r2)
E(r1)
14
DEE exercise: instructions
(1) The best (worst) energies are found using the worksheet:Add E(r1) to the sum of the lowest (highest) E(r1,r2) that havenot been previously eliminated.
(2) There are 9 possible DEE comparisons to make: 1a versus
1b, 1a versus 1c, 1b versus 1c, 2a versus 2b, etc. etc. For eachcomparison, find the minimum and maximum energychoices of the other rotamers. If the maximum energy of r1is less than the minimum energy of r2, eliminate r2.
(3) Scratch out the eliminated rotamer and repeat until onerotamer per position remains.
If the “best case scenario” for r1 is worse than the “worst casescenario” for r2 you can eliminate r1.
Sequence design using DEE•Selected residues (or all) are chosen for mutating.
•Selected (or all) amino acids are allowed at thosepositions.
•For the selected amino acids, all rotamers areconsidered.
Now "rotamer" comes to mean the amino acid identityand its conformation.
Since there are as many as 193 rotamers in therotamer library for all amino acids, each selectedposition can have as many as 193 "rotamers."
If "fine grained" rotamers are used, this number maybe much larger.
15
Theoretical complexity ofsequence design
To design THE OPTIMAL sequence, we need the best aminoacid, and its best rotamer at every position. We can treat eachposition as one of 193 possible rotamers. That's 191 rotamersin the Richardson library, plus Gly and Ala (which have norotamers)
How many possible sets of rotamers are there for a protein oflength 100?
193100 = 3.6*10228
DEE reduces the complexity of sequence design to about(193L)2 = 3.6*108
Sequence space maps to structure space
..as many-to-one.
This means that there is a lot of potential for "slop" in a sequencedesign. Moderately big sequence changes are possible, and thesequence can still fold to the same general structure.
sequencefamilies
fold
Good news for protein designers
16
Re-designing a binding siteThe group of Homme Hellinga (Duke Univ) has usedDEE to redesign the shape of a small moleculebinding site. the site originally bound the sugararabinose. It was redesigned to bind trinitrotoluene,seratonin, and L-aspartate.
How did they do it??
Recent success of sequence design
An appropriate binding site was found
Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W.Nature 423, 185–190 (2003).
The native ligand(arabinose) isapproximately thesame size as thetargeted ligand(seratonin).
17
A space was carved out for the ligand
Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W.Nature 423, 185–190 (2003).
All sidechains in thebinding site weretruncated to alanines,and a space was defined(yellow) for the newligand. Lots of possibleligand orientations weremade. Ligandorientations weretreated like rotamersin DEE!
A good energy function makes a good design
Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W.Nature 423, 185–190 (2003).
The most criticalcomponent of theenergy functionwas hydrogenbonding (dottedlines). Everydonor/acceptorshould besatisfied.
18
The End
Please use the remaining time to work on your term projects:
You should be : FINALIZING THE ALIGNMENT,ENERGY MINIMIZING, and adding ligands if necessary.