conformational sampling dragos horvath laboratoire d’infochimie – umr 7177...
Post on 13-Jan-2016
225 Views
Preview:
TRANSCRIPT
Conformational Sampling
Dragos HorvathDragos Horvath
Laboratoire d’InfoChimie – UMR 7177Laboratoire d’InfoChimie – UMR 7177
horvath@chimie.u-strasbg.frhorvath@chimie.u-strasbg.fr
• Presentation Outline
– The Basics: Molecules have Geometries!
• Intramolecular energy calculation: the Empirical Force Field
– Sampling Methods: a brief overview
– Molecular Casino: MonteCarlo Simulations
– Really Difficult Problems: Darwinism, God’s
Will and Massively Parallel Computing
3
Stable conformation :Minimum of energy
Unstable conformation :High potential energy
Two degrees of freedom
Different « conformations »or « geometries »
of a molecule
• The POTENTIAL ENERGY calculation is based on the EMPIRICAL FORCE FIELD APPROACH – Quantum chemical calculations are too time-consuming: atoms
and their interactions are approximated as “classical” objects – Atoms need to be “parameterized” in function of their chemical
environment: a C atom in an alkane does not carry a same partial charge as a carbonyl C=O!
– Covalent bonds are modeled as harmonic springs. The energy required to stretch or compress a bond by b with respect to its natural length b is expressed as Kbb2
– Valence angle bending modeled by harmonic potential K– Atoms that are not directly bonded or do not form an angle
interact “through space” by means of non-bonded interactions.• Van der Waals interactions• Electrostatics interactions – based on partial charges• Continuum Solvent models
5
Non-bonded interactions :
Coulomb :
Van der Waals :
Desolvation & Hydrophobic Term:
j,icoulomb
d**E
04
6
12
j,i
ji
j,i
jiVdW
dB*B
dA*AE
-
a1
a2
+
jikd
VQVQkE hphob
h
ji
ijjisolvSolv ,4
,
22
6
Global energy :
nbtorsbmolecule EEEEE
Torsional correction terms :
))3cos(1(* ttors kE
E=f(Geometry)
Torsions : the gateway to conformational sampling
- Energy Profile with respect to a torsion....
Torsions : the gateway to conformational sampling
- Energy Surface with respect to two torsions....
Torsions : the gateway to conformational sampling
- Alternative Contour Plot representation
The Ramachandran Plot
http://en.wikipedia.org/wiki/Ramachandran_plot
Key points on the energy surface...
1=0 36 72 108 144 180 216 252 288 324°
2 =
0 60 120 180 240 300
Computing a 2D torsion plot... Not that easy!
E
low
high
?
• Energy Minimization is only [the easy] part of the problem– Given a starting geometry, deterministic algorithms allow the
discovery of the adjacent local minimum– Descent methods follow the local gradient
)(:
,,,...,,,,,)(
,,,...,,,,,
22111
22111
currcurrnew
NNN
NNN
XEsXXyiterativel
z
E
y
E
x
E
y
E
x
E
z
E
y
E
x
EXE
zyxyxzyxXgeometrymolecular
E
X
Bad news: most molecules have more than 2 torsions...
- No visualization of the energy hypersurface is possible!
• Why care for conformational sampling?– Because experimental properties of a molecule are given by the
Boltzmann Average of properties of populated conformers
Boltzmann’s probability distribution:
Tk
EEenergyofconformerP
B
exp~
Boltzmann Averaging:
conformerspopulated
conformerPropertyconformerPopertyObservedPr )()(
Objective : finding the most probable
solutions
That is, the relevant minima
Energy
Geometry
The Challenge…
“Well”-docked(folded) zone
“Misdocked”(folded) conformers
“Misdocked”(folded) conformers
E
E#PDB
PDB
Absolu
te E
nergy
Absolu
te E
nergy
Minim
um
Minim
um
Native-like:
Native-like:
one local clash
one local clashEnergy=f(Geometry)
defined by the Empirical Force Field
Publisher’s Force Field:« Nice H bond »
My Force Field:« Bad Contact »
Microstates contributing to
macroscopic property
• Presentation Outline
– The Basics: Molecules have Geometries!
• Intramolecular energy calculation: the Empirical Force Field
– Sampling Methods: a brief overview
– Molecular Casino: MonteCarlo Simulations
– Really Difficult Problems: Darwinism, God’s
Will and Massively Parallel Computing
• Sampling methods– Systematic <3…4 torsions– Molecular Dynamics
• Solve Newton’s motion equations, given the atomic forces calculated by the force field: simulate “Brownian motion”
– Stochastic sampling:• Monte Carlo simulations• Genetic Algorithms
18
• Presentation Outline
– The Basics: Molecules have Geometries!
• Intramolecular energy calculation: the Empirical Force Field
– Sampling Methods: a brief overview
– Molecular Casino: Monte Carlo Simulations
– Really Difficult Problems: Darwinism, God’s
Will and Massively Parallel Computing
• The Monte Carlo Approach: win an Energy Optimum by Playing Dice!– Take a random geometry– Randomly choose a torsional axis– Apply a Random rotation around that axis– Recalculate the Energy of the thereof resulting
geometry• If lower – or, at least, not too (!) high, accept: make
new conformer new “default” geometry”• Otherwise, reject – restore ancient geometry
– Loop until no further energy drop is observed
21
• Presentation Outline
– The Basics: Molecules have Geometries!
• Intramolecular energy calculation: the Empirical Force Field
– Sampling Methods: a brief overview
– Molecular Casino: MonteCarlo Simulations
– Really Difficult Problems: Darwinism, God’s
Will and Massively Parallel Computing
23
Data representation :
« individual »or
« chromosome »=
list of itstorsional angles
Population of individuals :
… … ...... … … n…
… … ...... … … n…
… … ...... … … n…
… … ...... … … n…
…
nn-1…
• Genetic Algorithm– Applying a Darwinian Evolution Scenario to a population of
vectors (“chromosomes”) encoding the solution to a problem– Solution Quality is the “Fitness” score, and the fittest survive…
24
Generation of new offspring :
Crossover :… n…i+1i
…’
n…
’i+
1
’i
’
’
parent1 :
parent2 :
Mutation :… n…i+1iWild type :
…’
n
’i+1i…
… ni+1’i…’
’
child1 :
child2 :
… ni+1’i…mutant :
25
intermediate population...
n
... n
... n
... n
... n
... n
... n
... n
random
... n
... n
... n
... n
initial population
sorted
final population...
n
... n
... n
... n
sorted
Evolution of the average fitness,Evolution of the fitness of the best the algorithm converges
selection threshold
energies
Population Diversity Control is a Key Issue> Discarding of redundant chromosomes (requires a metricdefining how similar two encoded solutions are!)
> Multiple ‘Island’ models – parallel simulations occasionally swapping solutions
or God??
Genetic Algorithms: Chance, Selection & the CoinFlipper’s bet!
• Any problem admitting a vector as a solution may be coded by a “chromosome” and left in the hands of Darwin…
• I bet (1M€) I can find a person who won a coin-flipping challenge 10 times in a row, at his/her first attempt!!– In order to fulfill my promise, I need a total of 1024 coin flips
to happen,• 1024/10=102 pretendents, each with a chance of (1/2)10 to score 10
successive winning coin flips: ~90% chance to loose 1M€!
• If you read “Darwin’s Dangerous Idea” by D.C.Dennett, you are not allowed to bet !!
Selection is the Key!
1024 candidates / 512 flips
…
512 candidates / 256 flips
…
28
Hybrid strategies: (1) Selective Chromosome Initialization:
- Knowledge-based: favoring locally stable torsions…
polycycle : torsion nr. 1
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0 100 200 300
angle
pro
bab
ilit
é
polycycle : torsion nr. 3
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0 100 200 300
anglep
rob
abil
itie
s
biased torsion probabilities thanks to learning
biased torsion probabilities wrt local Hamiltonian
- ‘Traditionalism’: favoring torsion values seen in previously visited samples
29
Evolution stalled in local minimum,
Mutations will not help!
Add a constraint term forcing 1 to adopt ‘mutant’ value ’1
Gradient optimization, following the new energy
landscape…‘Lamarckian’ move towards
next optimum
Process in parallel to main GAstream in order to avoid halting evolution!
Hybrid Strategies (2): Directed or ‘Lamarckian’ Mutations
Hybrid Heuristics (3) The Taboo Search Dilemma
Evolved Solution
Evolved Solution
“Taboo”Phase space region
??
Search for Optimal Sampling Setups in the Strategy Parameter Space…
p1 p2 p3 p4 p5 p6 p14 p15
Population management
Population size
Number of parallel process
Migration rate between ‘islands’
Evolution management
Crossover rate
Mutation rate
One/two point crossover rate
Selection pressure
Dissimilarity limit
Maximal age
Convergence management
Apocalypse (population reset) frequency
Elitism
Global stop condition
CPUtimeTk
ETkFitness
b
ib .expln._
minimafound
323-fold repeat
Postprocessing…
Run 1
Run 2
Runn
…
Global Base of
Diverse Conformers
Base of diverse conformers[sampled at current setup]
µ-Fitness!!
Meta-algorithm defines parameter setup
News??
« Tabus »« Tradition »
Meta-GA picksnext set of
configurations
yes
GAMEOVER
no
DirectedMutations
The Island Model
GRID 5000-based ‘Planetary’ Model
If (free node)DEPLOY
Island Model
- Executables- Molecule File- Constraint Files- Seeds List- Taboo List- Operational Pars
-Stablest Chromosomes-Sampling Success Score
Solution Merger& Clusterer
Conformer & Cluster Database
‘Panspermia’ policy center‘recent’ clusters: seeds
‘old’ clusters: taboo
Sampling Success vs.Operational Pars
Stop:max. ‘Mission Nr.’
no new clusters sinceN ‘missions’
www.grid5000.fr
Operational ParsSelector
• Ab initio folding of Trp cage 1L2YTrp cage 1L2Y: native structure (reproducibly) found and ranked as most stable. D&C Planetary model: 20 nodes for 24 hours
PDB PDB
• Ab initio folding of the Villin headpiece 1VIIVillin headpiece 1VII: helical parts are seen to fold in a matter of days (40 nodes) – although not properly oriented.
PDB PDB
• Good news for the -hairpin of ChignolinChignolin: out of the top 10 best ranked conformers, 8 are native-like
• Number one is not – but in this case, that may not be a problem
PDB PDB
#1,#5#1,#5
• However, proper folding of 1LE1 could be achieved (though not reproducibly!) with previous force field versions – is the current setup too helix-specific?
• The 1LE1 -sheet is not the absolute energy minimum according to the current setup!
PDB PDB
• Docking simulations in presence of flexible loops, such as the hinge region of Casein Kinase 2 (3BQC)Casein Kinase 2 (3BQC)
– pose of ligand emodin and loop geometry are correctly predicted (3BQC not in FF training set).
Flexible hinge region
PDB, #1
PDB, #1
• Furthermore, a crystallographic water molecule can be simultaneously docked, being considered as another ligand – and is correctly placed.
Flexible hinge region
Water location converges
Water location converges
towards experimental
towards experimental
position
position
• Docking into GPCRs: (1) Turkey 1-Adrenergic Receptor – Cyanopindolol complex 2VT4, 190 degrees of freedom (ligand and side chains) – 30 days/20 nodes**. Ligand RMS =0.48 A (best pose)
** total run time required to visit ~40000 phase space cells** total run time required to visit ~40000 phase space cells
• Conclusions– Conformational Sampling is the Key Element for Understanding
of Molecular Behavior– It may range from very simple to extremely difficult, to impossible– If you don’t do it well, better don’t do it at all: empirical methods
based on molecular topology only may be more accurate than 3D models based on wrong – or too few – conformations
– Two main sources of errors: A.) wrong calculated energy-geometry landscape (poor Force Field parameterization) and B.) – insufficient sampling!
– Docking is just a specific case of conformational sampling, involving at least two molecules: a binding “site” and one or more “ligands”
– You will often hear that the knowledge of the “bioactive” conformer is paramount to understand binding. This is necessary, but sometimes not sufficient. Note: the “bioactive” conformer may sometimes be quite unstable and almost never populated in the free state.
top related