structure prediction: homology modeling lecture 8 structural bioinformatics dr. avraham samson...
Post on 14-Jan-2016
213 Views
Preview:
TRANSCRIPT
Structure prediction: Homology modeling
Lecture 8Structural Bioinformatics
Dr. Avraham Samson81-871
Homology Modeling
• Presentation• Fold recognition• Model building
– Loop building– Sidechain modeling– Refinement
• Testing methods: the CASP experiment
Homology Modeling
• Presentation• Fold recognition• Model building
– Loop building– Sidechain modeling– Refinement
• Testing methods: the CASP experiment
Why do we need homology modeling ?
To be compared with:
Structural Genomics project
• Aim to solve the structure of all proteins: this is too much work experimentally!
• Solve enough structures so that the remaining structures can be inferred from those experimental structures
• The number of experimental structures needed depend on our abilities to generate a model.
Proteinswithknownstructures
Unknown proteins
Structural Genomics
Homology Modeling: why it works
High sequence identity
High structure similarity
Homology Modeling: How it works
o Find template
o Align target sequence with template
o Generate model:- add loops- add sidechains
o Refine model
Homology Modeling
• Presentation• Fold recognition• Model building
– Loop building– Sidechain modeling– Refinement
• Testing methods: the CASP experiment
Fold Recognition
Homology modeling refers to the easy case when the template structure can be identified using BLAST alone.
What to do when BLAST fails to identify a template?
•Use more sophisticated sequence methods•Profile-based BLAST: PSIBLAST•Hidden Markov Models (HMM)
•Use secondary structure prediction to guide the selection of a template, or to validate a template
•Use threading programs: sequence-structure alignments
•Use all of these methods! Meta-servers: http://bioinfo.pl/Meta
Fold Recognition
Blast for PDB search
Full homology modeling packages
Profile based approach
HMM
Structure-derived profiles
Fold recognition andSecondary structure prediction
Homology Modeling
• Presentation• Fold recognition• Model building
– Loop building– Sidechain modeling– Refinement
• Testing methods: the CASP experiment
Very short loops: Analytic Approach
Wedemeyer,ScheragaJ. Comput. Chem.20, 819-844(1999)
Medium loops: A database approach
Scan database and search protein fragments with correct number of residuesand correct end-to-end distances
Loop length
cRM
S (Ǻ
)
Method breaksdown for loopslarger than 9
Medium loops: A database approach
1) Clustering Protein Fragments to Extract a Small Set of Representatives (a Library)
data clustereddata
library
Long loops: A fragment-based approach
Generating Loops
Fragment library
Generating Loops
Fragment library
Generating Loops
Fragment library
Generating Loops
Fragment library
Generating Loops
Fragment library
Threading
Loop length
<cR
MS
(Ǻ)
Long loops: A fragment-based approach
Test cases: 20 loops for each loop length Methods: database search, and fragment building, with fragment libraries of size L
Loop building: Other methods
Heuristic sampling (Monte Carlo, simulated annealing)
Inverse kinematics
Relaxation techniques
Systematic sampling
http://www.cs.ucdavis.edu/~koehl/BioEbook/loop_building.html
Homology Modeling
• Presentation• Fold recognition• Model building
– Loop building– Sidechain modeling– Refinement
• Testing methods: the CASP experiment
Self-Consistent Mean-Field Sampling
P(J,1)
P(J,3)
P(J,2)
i,2i,1i,3
Self-Consistent Mean-Field Sampling
i,2i,1i,3
P(i,1)
P(i,3)
P(i,2)
P(i,1)+P(i,2)+P(i,3)=1
Self-Consistent Mean-Field Sampling
Multicopy Protein
i,1
i,2i,3
j,1
j,3
j,2
Self-Consistent Mean-Field Sampling
E(i,k) = U(i,k) + U(i,k,Backbone)
Multicopy Protein Mean-Field Energy
i,1
i,2i,3
j,1
j,3
j,2
Self-Consistent Mean-Field Sampling
Nrot(i)
1l
new
RTl)E(i,
exp
RTk)E(i,
expk)(i,P
E(i,k) = U(i,k) + U(i,k,Backbone)
Multicopy Protein Mean-Field Energy
Update Cycle
i,1
i,2i,3
j,1
j,3
j,2
(Koehl and Delarue, J. Mol. Biol., 239:249-275 (1994))
Self-Consistent Mean-Field Sampling
Another way is simply aligning the side chains as well
Dead End Elimination (DEE) Theorem
• There is a global minimum energy conformation (GMEC) for which there is a unique rotamer for each residue
• The energy of the system must be pairwise.
1
1 11
),(),()(N
i
N
ijsr
N
irTemplateTot jiETemplateiEECE
Each residue i has a set of possible rotamers. The notation ir means residue i has the conformation described by rotamer r.
The energy of any conformation C of the protein is given by:
CGMECECE TotTot )()(
Note that:
Dead End Elimination (DEE) Theorem
Consider two rotamers, ir and it, at residue i and the set of all other rotamer conformations {S} at all residues excluding i.
If the pairwise energy between ir and js is higher than the pairwise energy between it and js, for all js in {S}, then ir cannot exist in the GMEC and is eliminated. Mathematically:
}{,,,, SjiETemplateiEjiETemplateiEij
srrij
srr
If
then
ir does not belong to the GMEC
This is impractical as it requires {S}. It can be simplified to:
ij
sts
tij
srs
r jiETemplateiEjiETemplateiE ,max),(,min),(
Dead End Elimination (DEE) Theorem
If
then
ir does not belong to the GMEC
Iteratively eliminate high energy rotamers: proved to converge to GMEC
Desmet, J, De Maeyer, M, Hazes, B, Lasters, I. Nature, 356:539-542 (1992)
Other methods for side-chain modeling
-Heuristics (Monte Carlo, Simulated Annealing)SCWRL (Dunbrack)
-Pruning techniques
-Mean field methods
Loop building + Sidechain Modeling: generalized SCMF
Template
Add multi-copiesof candidate “loops”
Add multi-copiesof candidate side-chains
Final modelKoehl and Delarue.Nature Struct. Bio.2, 163-170 (1995)
Homology Modeling
• Presentation• Fold recognition• Model building
– Loop building– Sidechain modeling– Refinement
• Testing methods: the CASP experiment
Refinement ?
CASP6 assessors, homology modeling category:
“We are forced to draw the disappointing conclusion that, similarlyto what observed in previous editions of the experiment, no modelresulted to be closer to the target structure than the template toany significant extent.”
The consensus is not to refine the model, as refinement usually pulls the model away from the native structure!!
Homology Modeling: Practical guideApproach 2: Submit target sequence to automatic servers
- Fully automatic:
- 3D-Jigsaw : http://www.bmm.icnet.uk/servers/3djigsaw/
- EsyPred3D: http://www.fundp.ac.be/urbm/bioinfo/esypred/
- SwissModel: http://swissmodel.expasy.org//SWISS-MODEL.html
- Fold recognition:
- 3D-PSSM: http://www.sbg.bio.ic.ac.uk/~3dpssm/
- Useful sites:
- Meta server: http://bioinfo.pl/Meta
- PredictProtein: http://cubic.bioc.columbia.edu/predictprotein/
Comparative Modeling Server & Program
COMPOSER http://www.tripos.com/sciTech/inSilicoDisc/bioInformatics/matchmaker.html
MODELER http://salilab.org/modeler
InsightII http://www.msi.com/
SYBYL http://www.tripos.com/
top related