andras[1]
TRANSCRIPT
-
7/27/2019 Andras[1]
1/21
Protein structure modelingProtein structure modeling
Department of Biochemistry and
Seaver Foundation Center for Bioinformatics
Albert Einstein College of MedicineNew York, USA
AndrAndrss FiserFiser
-
7/27/2019 Andras[1]
2/21
Why is it useful to know the structure of a
protein not only its sequence?
The 3D structure is more informative than sequence because patterns inspace are frequently more recognizable than patterns in sequence
Evolution tends to conserve function and
function depends more directly on structthan on sequence, structure is more
conserved in evolution than sequence.
-
7/27/2019 Andras[1]
3/21
Why Protein Structure Prediction?
Y 2005
29,000Structures
2,300,000Sequences
We know the experimental 3D structure for~1% of the protein sequences
-
7/27/2019 Andras[1]
4/21
Principles of Protein Structure
GFCHIKAYTRLIMVG
An
abaena7120
Anacystisnidulans
Condruscrispus
Desulfovibriovulgaris
Ab initio prediction Fold Recognition
Com arative Modelin
folding evolution
-
7/27/2019 Andras[1]
5/21
Protein structure modeling
Ab initio prediction Comparative Modeling
pplicable to any sequence
ot very accurate (>4 Ang RMSD),
ttempted for proteins of
-
7/27/2019 Andras[1]
6/21
A small difference in the sequence makes a small
ifference in the structure
I Protein structures are clustered into fold families
a ma es compara ve mo e ng poss e
-
7/27/2019 Andras[1]
7/21
Structural Genomics
The number of families is
much smaller than the numberof proteins
haracterize most protein sequences (red) based on related
nown structures (green).
S G
-
7/27/2019 Andras[1]
8/21
Structural Genomics
efinition: The aim of structural genomics is to put every protein sequence withinmodeling distance of a known protein structure.
ize of the problem:
There are a few thousand domain fold families.There are ~20,000 sequence families (30% sequence id).
olution:Determine protein structures for as many different families as possible.
Model the rest of the family members using comparative modeling
-
7/27/2019 Andras[1]
9/21
Comparative Protein Structure Modeling
COMPARATIVE
MODELING
0 (100)2 (50) 1 (80)
Ca RMSD (% EQV)
20 50 100
Anabaena 7120
Anacystis nidulans
Condrus crispus
Desulfovibrio vulgaris
Clostridium mp.
KIGIFFSTSTGNTTEVA
Flavodoxinfamily
-
7/27/2019 Andras[1]
10/21
teps in Comparative Protein Structure Modeli
MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE
ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE
ASILPKRLFGNCEQTSDEGLK
IERTPLVPHISAQNVCLKIDD
VPERLIPERASFQWMNDK
TARGET TEMPLATE
No
Target TemplateAlignment
Model Building
START
Template Search
OK?
Model Evaluation
ENDYes
-
7/27/2019 Andras[1]
11/21
Steps in Comparative Protein Structure Modeling
No
Target TemplateAlignment
Model Building
START
Template Search
OK?
Model Evaluation
END
Yes
Pattern recognition, heuristic searches
(e.g. BLAST, FastA)
Profile and iterative alignment methods(e.g. HMMs, PSI-BLAST)
Structure based threading
(e.g. THREADER, FUGUE, 3DPSSM)
-
7/27/2019 Andras[1]
12/21
Steps in Comparative Protein Structure Modeling
No
Target TemplateAlignment
Model Building
START
Template Search
OK?
Model Evaluation
END
Yes
Dynamic Programming, Pairwise Alignme
Multiple Alignments, Profiles, HMMs Structure based approaches (Threading)
-
7/27/2019 Andras[1]
13/21
Steps in Comparative Protein Structure Modeling
No
Target TemplateAlignment
Model Building
START
Template Search
OK?
Model Evaluation
END
Yes
Rigid Body Assembly (COMPOSER)
Segment Matching (SEGMOD, 3DPSSM)
Satisfaction of Spatial Restraints (MODELLE
Integrated (NEST)
loop modeling, side chain modeling
-
7/27/2019 Andras[1]
14/21
teps in Comparative Protein Structure Modelin
No
Target TemplateAlignment
Model Building
START
Template Search
OK?
Model Evaluation
END
Yes
Stereochemistry (PROCHECK, WHATCHE
Environment (Profiles3D, Verify3d)
Statistical potentials based methods (PROS
Is the model reliable?
A model is reliable when it is based on a
correct template and on an approximatelycorrect alignment.
-
7/27/2019 Andras[1]
15/21
Typical Errors in Comparative Models
Distortion in correctly
aligned regionsRegion without a
template Side chain packing
Incorrect template
MODELX RAYTEMPLATE
Misalignment
mpar ng accurac es o exper men a an eore ca approac
-
7/27/2019 Andras[1]
16/21
mpar ng accurac es o exper men a an eore ca approac
Some Models Can Be Surprisingly Accurateome o e s an e urpr s ng y ccurate
-
7/27/2019 Andras[1]
17/21
24 sequence identity
YJL001W1rypH
25 sequence identityYGL203C
1ac5
Ser 176
His 488
Asp 383
Some Models Can Be Surprisingly Accurateome o e s an e urpr s ng y ccurate(in Some Regions)(in Some Regions)
-
7/27/2019 Andras[1]
18/21
in Zebrafish forkhead transcription factor Foxi1
RMSD
re-modelled wild type segments(6 and 7aa) and NMR: 1.78 and 1.82modelled mutated segments with each other (6 and 7aa): 1.19
wild type and mutated segments (6 and 7 aa): 3.65 and 3.75
ere su un commun ca on n su am es o ases
-
7/27/2019 Andras[1]
19/21
phila m. H. sapiens
li Eq. inf. virus
Predicting features that are not present in the template
1. Active form usually is a trimer,each active site is formed by all threemonomers.
2. Comparison of models and X-raystructures reveals two subclasses
of dUTPases with different type of
subunit interfaces.
3.Altered character of subunitinterfaces correlates with the
suggested different functional
mechanism: polar/charged surface
is better adjusted for allosterism.
ere su un commun ca on n su am es o ases
Convergent evolution of Trichomonas vaginalis lactate
-
7/27/2019 Andras[1]
20/21
Designing new enzyme specificity with the aid of comparative models
1. Sequences are identi
from the Trichomona
genome project
2. Mutations were
designed using the
constructed 3D mode
to switch specificity.
Convergent evolution of Trichomonas vaginalis lactatedehydrogenase from malate dehydrogenase.
Core histones of the amitochondriate protist Giardia lambli
-
7/27/2019 Andras[1]
21/21
Confirming fold by energy evaluation of comparative model
-5.23/-4.42-2.74/-2.39-3.98/-4.05-5.41/-5.09X-ray
-4.79-0.26-2.82-2.29H4
-0.41-2.38-0.61-1.35H3
-1.70-0.41-4.34-1.15H2B
-2.77-0.64-3.42-4.74H2A
1aoiB/F1aoiA/E1aoiD/H1aoiC/GG.Lamblia
Core histones of the amitochondriate protist, Giardia lambli