cs612 - algorithms in bioinformaticsnurith/cs612/sampling.pdfnurit haspel cs612 - algorithms in...
TRANSCRIPT
-
CS612 - Algorithms in Bioinformatics
Sampling
April 23, 2019
-
From a Rigid Ligand to a Flexible Ligand
Torsional (Dihedral) Degrees of Freedom (DOF)
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Kinematics
Kinematics is a branch of classical mechanics that describesthe motion of points, bodies (objects), and systems of bodies(groups of objects) without considering the forces that causedthe motion.
A kinematics problem begins by describing the geometry of asystem and the initial conditions of any known values ofposition, velocity and/or acceleration of points in the system.
Then, using geometric methods, the position, velocity andacceleration of any unknown parts of the system can bedetermined.
Forward kinematics is the use of the kinematic equations of arobot to compute the position of the end-effector fromspecified values for the joint parameters.
In protein motion, the problem becomes computing the newlocations of the atoms given a set of dihedral rotations.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Robotics-inspired Approach to Protein Flexibility
Similarity between proteins and robots: exploration ofcomplex high-dimensional space
Similarity exploited to sample conformations with spatialconstraints
Articulated manipulator Protein Extended Backbone
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Robotics-inspired Approach to Protein Flexibility
Exploration of protein conformational space has parallels inrobotics
0/1 collisions for robots versus energy field for proteins
adapted from J.-C.Latombe, Stanford
adapted from P. Smith,KSU
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Robotics-inspired Approach to Protein Flexibility
Dimensionality of configuration space
DOFs (rigid-body transformations and DOFs of the ligand)Too many DOFs mean that the configuration space of theligand is high-dimensional and difficult to searchSimilar issue when planning motions for an articulated roboticchain in a cluttered environment
Geometric complexity of the free space
Difficult to determine whether a ligand conformation andspecific position and orientation result in a good fitSimilar issue for an articulated robot
Address: Plan motions in the configuration space but compute inworkspace (protein surface or cavity)!
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
Conf. space Forbidden space Free space
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
Configurations are sampled by picking coordinates at random
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
Configurations are sampled by picking coordinates at random
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
Sampled configurations are tested for collision (in workspace!)
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
The collision-free configurations are retained as “milestones”
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
Each milestone is linked by straight paths to its k-nearest neighbors
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
Each milestone is linked by straight paths to its k-nearest neighbors
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
The collision-free links are retained to form the PRM
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Probabilistic Roadmap Motion Planning (PRM)
Finding paths in the map.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Application of PRM to Protein-Ligand Docking
Protein is assumed to berigid
A fixed coordinate system Pis attached to the protein
Ligand is a small flexiblemolecule
A moving coordinate systemL is defined using threebonded atoms in the ligand
A conformation of the ligandis defined by the positionand orientation of L relativeto P and the torsional anglesof the ligand
x y
z
x y
z
A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning Approach to Flexible Ligand Binding. Proc. 7thISMB, pp. 252-261, 1999
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Roadmap Construction: Node Generation
The nodes of the roadmap aregenerated by samplingconformations of the liganduniformly at random in theparameter space (around theprotein)
The energy of each sampledconformation is E = Einteraction(electrostatic) + Einternal (vdw)A sampled conformation isretained with probability:
p =
0 if E > Emax
Emax−EEmax−Emin
if Emin ≤ E ≤ Emax1 if E < Emin
x y
z
x y
z
Results in denser distribution ofnodes in low-energy regions ofconformational space
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Roadmap Construction: Edge Generation
q q′qi qi+1
Each node is connected toits closest neighbors bystraight edges
Each edge is discretized sothat between qi and qi+1 noatom moves by more thansome ε = 1Å.
x y
z
x y
z
Results in denser distribution ofnodes in low-energy regions ofconformational space
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Querying the Roadmap
For a given goal node qg(e.g., binding conformation),the Dijkstras single-sourceshortest-path algorithmcomputes the lowest-weightpaths from qg to each node(in either direction) inO(N logN) time, where N= number of nodes
Various quantities can thenbe easily computed in O(N)time, e.g., average weightsof all paths entering qg andof all paths leaving qg(binding and dissociationrates Kon and Koff )
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Computing Binding Conformations
Sample many (several1000s) ligand’sconformations at randomaround protein
Repeat several times:
Select lowest-energyconformations that are closeto protein surface
Re-sample around them
Retain k (approx. 10)lowest-energy conformationswhose centers of mass are atleast 5Å apart
Active site
?
lactate dehydrogenase
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Testing on Three Complexes
PDB ID: 1ldm Receptor: Lactate Dehydrogenase (2386atoms, 309 residues) Ligand: Oxamate (6 atoms, 7 dofs)
PDB ID: 4ts1 Receptor: Mutant of tyrosyl-transfer-RNAsynthetase (2423 atoms, 319 residues) Ligand: L-leucyl-hydroxylamine (13 atoms, 9 dofs)
PDB ID: 1stp Receptor: Streptavidin (901 atoms, 121residues) Ligand: Biotin (16 atoms, 11 dofs)
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Finding Folding Pathways Using RPM
Degrees of freedom – number of rotatable backbone dihedralangles (approx. 2N, number of amino acids)
Nodes generated in a similar manner as the docking schemeabove.
Sampling cannot be done at random due to highdimensionality – sampling is done from a set of distributionsaround the native state.
Edges connect neighboring nodes in a similar manner to theone described above.
Can be used to discover folding pathways, intermediatestructures and other folding events.
G. Song, N. Amato, RECOMB 2001
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
From Flexible Ligand to Flexible Receptor?
Modeling full receptor flexibility is very difficult!
In order for this process to become efficient, we must find arepresentation for protein flexibility that avoids the directsearch of a solution space comprised of thousands of degreesof freedom.
There are several methods available, and the accuracy of theresults is usually directly proportional to the computationalcomplexity of the representation.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
From Flexible Ligand to Flexible Receptor?
The dimensionality of the proteinconformational space is much larger thanthat of a small ligand
PRM-based methods that samplethousands of conformations to get a goodview of the ligand conformational spaceare not sufficient
Challenge: from 7-10 DOFs to thousandsof DOFs
Goal: Model protein flexibility to capturerelevant conformations of the flexible receptor
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Receptor Flexibility – Soft Receptor
Soft receptors can be easily generated by relaxing the highVdW energy penalty
The rationale is that the receptor structure has some inherentflexibility which allows it to adapt to slightly differentlyshaped ligands.
If the change in the receptor conformation is small enough, itis assumed that the receptor is capable of such aconformational change.
It is also assumed that the change in protein conformationdoes not incur a sufficiently high energetic penalty to offsetthe improved interaction energy between the ligand and thereceptor.
It is also quite easy to implement (relax the collisioncomponent).
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Receptor Flexibility – Selecting Specific DOFs
is it possible to select only a few degrees of freedom to modelexplicitly.
They usually correspond to rotations around single bonds
These degrees of freedom are usually considered the naturaldegrees of freedom in molecules.
Rotations around bonds lead to deviations from idealgeometry that result in a small energy penalty when comparedto deviations from ideality in bond lengths and bond angles.
Selection of which torsional degrees of freedom to model isusually the most difficult part of this method because itrequires a considerable amount of a priori knowledge.
The torsions chosen are usually rotations of side chains in thebinding site of the receptor protein.
It is also common to further reduce the search space by usingrotamer libraries.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Receptor Flexibility – Ensemble Docking
One possible way to represent a flexible receptor for drugdesign applications is the use of multiple static receptorstructures
The best description for a protein structure is that of aconformational ensemble of slightly different protein structurescoexisting in a low energy region of the potential energysurface.
The structures can be determined experimentally either fromX-ray crystallography or NMR, or generated via computationalmethods such as Monte Carlo or MD simulations.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Limited Receptor Flexibility
Selection of specific degreesof freedom such as ondesignated amino acids onbinding site
Shown here:Acetylcholinesterase:Phe330 flexible – acts asswinging gate
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Limited Receptor Flexibility
Moving larger number of amino acids (illustration onacetylcholinesterase)
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Receptor Flexibility – Collective DOF
Collective DOF allows therepresentation of full proteinflexibility without a dramaticincrease in computationalcost.
One method is thecalculation of normal modesfor the receptor.
Alternatively, we can usedimensionality reductionmethods.
The most commonly usedmethod for the study ofprotein motions is principalcomponent analysis (PCA).
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Inverse Kinematics (IK)
Inverse kinematics is the problem of finding the right valuesfor the underlying degrees of freedom of a chain.
In the case of a protein chain these degrees of freedom of thedihedral angles, so that the chain satisfies certain spatialconstraints.
For example, in some applications, it is necessary to findrotations that can steer certain atoms to desired locations inspace.
The applications of inverse kinematics to protein structureinclude mainly loop modeling and generating ensembles ofstructures.
In this case - manipulate the rotational degrees of freedom ofa loop region to find possible loop conformations that attachto the rest of the protein.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Loops Using Inverse Kinematics
Goal: Model the ensemble of conformations of a protein.
It is known that proteins are not rigid but fluctuate about anensemble of structures under equilibrium conditions.
Focus mostly on loop regions, as they are the most flexibleones.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Loops Using Inverse Kinematics
Inverse kinematics: Manipulate the degrees of freedom of anarticulated chain to satisfy some end-constraints.
In this case - manipulate the rotational degrees of freedom ofa loop region to find possible loop conformations that attachto the rest of the protein.
Cyclic Coordinate Descent (CCD): solve for and rotate onedihedral at a time.
Canutescu A. A., and Dunbrack R. L. Protein Science 12, 2003
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
CCD for Inverse Kinematics
Goal: find optimal values tosimultaneously steer thethree backbone atoms of theend of the fragment to theirtarget positions.
Current positions beforerotation - M0, after rotationM and target positions F .
S is the sum of squareddistances between currentpositions and targetpositions
Steering these three atomsto their target positionsrequires minimizing S .
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
CCD for Inverse Kinematics
S is defined as:
S = |~F1M1|2 + |~F2M2|2 + |~F3M3|2
Where~F1M1 = ~O1M1 − ~O1F1
Notice that it is a 2D rotation around the plane defined by ther̂ and ŝ local axes.
The squared norm of the vector M − F (denoted FM) has thisvalue for each of the three atoms, so we can sum the threecontributions to S .
We can express the rotation with respect to the r̂ and ŝ planeas:
~O1M1 = r1 cos θr̂1 + r1 sin θŝ1
r1 is the vector between O and M01, which we want to rotateby θ.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
CCD for Inverse Kinematics
From the previous equations above it follows that:
~FiMi = ri cos θr̂i + ri sin θŝi − ~fi ≡ ~di , i = 1, 2, 3
Calculating the squared distances between the moving atomsand the fixed target atoms, we obtain:
|~di |2 = r2i + f 2i − 2ri cos θ(~fi 1 · r̂i )− 2ri sin θ(~fi · ŝi )Putting it all together, we can express S as the sum of thesquared distances above.
Differentiating with respect to θ gives us:
dS
dθ=
d |~d1|2dθ
+d |~d2|2dθ
+d |~d3|2dθ
whered |~di |2dθ
= 2ri sin θ(~fi · r̂i )− 2ri cos θ(~fi · ŝi )
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
CCD for Inverse Kinematics
After a little bit of math, S can be written as:
S = a−√
b2 + c2 cos(θ − α)
S is minimum when θ = α. Now we have explicit values forsine and cosine.
Notice that the Time complexity is linear time on the numberof DOFs to solve for all dihedrals of a chain.
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Loops Using Inverse Kinematics
Cyclic Coordinate Descent:solve for and rotate onedihedral at a time
Given: atom at currentposition M, target position F
Goal: Solve for dihedral θs.t.|F −M|2 = S(θ) < εthreshold
Time complexity: Lineartime on the nr. DOFs tosolve for all dihedrals of achain
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Loops Using Inverse Kinematics
Cyclic Coordinate Descent:solve for and rotate onedihedral at a time
Given: atom at currentposition M, target position F
Goal: Solve for dihedral θs.t.|F −M|2 = S(θ) < εthreshold
Time complexity: Lineartime on the nr. DOFs tosolve for all dihedrals of achain
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Loops Using Inverse Kinematics
Cyclic Coordinate Descent:solve for and rotate onedihedral at a time
Given: atom at currentposition M, target position F
Goal: Solve for dihedral θs.t.|F −M|2 = S(θ) < εthreshold
Time complexity: Lineartime on the nr. DOFs tosolve for all dihedrals of achain
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Loops Using Inverse Kinematics
Cyclic Coordinate Descent:solve for and rotate onedihedral at a time
Given: atom at currentposition M, target position F
Goal: Solve for dihedral θs.t.|F −M|2 = S(θ) < εthreshold
Time complexity: Lineartime on the nr. DOFs tosolve for all dihedrals of achain
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Modeling Loops Using Inverse Kinematics
Since there is redundancy, many solutions are feasible.
Find rotations to satisfy spatial constraints on atoms Combinewith energy minimization to obtain physical structures
Example: Chymotrypsin inhibitor 2
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Equilibrium Fluctuations
More DOFs than spatial constraints can be exploited to generatefragment fluctuations
Example: Chymotrypsin inhibitor 2
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Equilibrium Fluctuations
Sample equilibrium fluctuations:
Spatially constrained through Cyclic Coordinate Descent
Energetically constrained to be feasible
Local Fluctuations inα-Lactalbumin
Boltzmann ensemble average
RMSDx =∑
Confs
RMSD(C ,Cnative)e−β∆Ec
Q
∆Ec = Ec − EnativeQ =
∑Confs
e−β∆Ec
Nurit Haspel CS612 - Algorithms in Bioinformatics
-
Equilibrium Fluctuations
α-Lactalbumin (α-Lac)
123 residues
Hydrogen exchangeprotection factors available
Ubiquitin
76 residues NMRinformation on fluctuationsavailable
Nurit Haspel CS612 - Algorithms in Bioinformatics