alignment of flexible molecular structures
DESCRIPTION
Alignment of Flexible Molecular Structures. Motivation. Proteins are flexible. One would like to align proteins modulo the flexibility. Hinge and sh ear protein domain motions (Gerstein, Lesk , Chotia). Conformational flexibility in drugs. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Motivation
• Proteins are flexible. One would like to align proteins modulo the flexibility.
• Hinge and shear protein domain motions (Gerstein, Lesk , Chotia).
• Conformational flexibility in drugs.
Flexible protein alignment without prior hinge knowledge
FlexProt - algorithm
– detects automatically flexibility regions
– exploits amino acid sequence order
• Task: largest flexible alignment by largest flexible alignment by decomposing the two molecules into a decomposing the two molecules into a minimalminimal number of rigid fragment pairs number of rigid fragment pairs having similar 3-D structure.having similar 3-D structure.
Detection of Congruent Detection of Congruent Rigid Fragment PairsRigid Fragment Pairs
Joining Rigid Joining Rigid Fragment PairsFragment Pairs
Rigid Rigid Structural ComparisonStructural Comparison
ClusteringClustering(removing ins/dels)(removing ins/dels)
FlexProt Main Steps
j
i+1
j+1i
j-1
i-1
vi-1 vi vi+1
wj-1 wj wj+1
Fragkt(l) = vk … vi ... vk+l-1
wt … wj … wt+l-1
RMSD (Fragkt(l) ) <
Detection of Congruent Rigid Fragment Pairs
k
t
k+l-1t+l-1
Detection of Congruent Detection of Congruent Rigid Fragment PairsRigid Fragment Pairs
Joining Rigid Joining Rigid Fragment PairsFragment Pairs
Rigid Rigid Structural ComparisonStructural Comparison
ClusteringClustering(removing ins/dels)(removing ins/dels)
FlexProt Main Steps
Graph Representation
•The fragments are in ascending order.The fragments are in ascending order.
•The gaps (ins/dels) are limited.The gaps (ins/dels) are limited.
•Allow some overlapping.Allow some overlapping.
W
+ Size of the rigid fragment pair (node b)
- Gaps (ins/dels)
- OverlappingPenalties
a b
W_i
W_k
W_t
W_mW_n
•““Single-source shortest paths”Single-source shortest paths” O(|E|+|V|) O(|E|+|V|)
Detection of Congruent Detection of Congruent Rigid Fragment PairsRigid Fragment Pairs
Joining Rigid Joining Rigid Fragment PairsFragment Pairs
Rigid Rigid Structural ComparisonStructural Comparison
ClusteringClustering(removing ins/dels)(removing ins/dels)
FlexProt Main Steps
Clustering (removing ins/dels)
T1
T2
If joining two fragment pairs gives small RMSD (T1 ~ T2) then put them into one cluster.
Detection of Congruent Detection of Congruent Rigid Fragment PairsRigid Fragment Pairs
Joining Rigid Joining Rigid Fragment PairsFragment Pairs
Rigid Rigid Structural ComparisonStructural Comparison
ClusteringClustering(removing ins/dels)(removing ins/dels)
FlexProt Main Steps
Multiple Structural Alignment Schemes
•Linear progressive. Starts with one object and successively compares the other objects to the results.•Tree progressive. The alignment is created according to a similarity tree. The alignment direction is from the leaves to the tree root.
•Gerstein and Levitt 1998. •Orengo and Taylor 1994. SSAPm method.•Sali and Blundell 1990•Russell and Barton 1992•Ding et al. 1994
Multiple Structural Alignment Schemes
•Pivot. Uses one object as the pivot and compares it to all other objects. The results are then analyzed to find the common similarities.
•Leibowitz, Fligelman, Nussinov, and Wolfson 1999. Geometric Hashing technique.•Escalier, Pothier, Soldano, Viari 1998. Exploits all common substructures.
Multiple Structural Alignment Schemes
•Optimization Techniques.
•Guda, Scheeff, Bourne, Shindyalov. Monte Carlo optimization.
Previous Work – Multiple Structural Alignment
Disadvantages:•Most methods do not detect partial solutions.•The methods which detect partial solutions are not efficient for a large number of molecules.
Partial Solutions
A
A
AB
B
B is harder to detect than A
• Detection of local similarities.• Detection of subset of
molecules that share some local structural pattern.
Largest Common Point Set (LCP)
• Multiple-LCP is NP-hard even in one dimensional space for the case of exact congruence (Akutsu 2000).
• 3-D + ε-congruence more complex problem
Given two point sets detect the largest common sub-set.[exact congruence or ε-congruence]
Solution Space•The number of solutions, which answer the minimal criteria, could be exponential.
α-1 α-2 α-3
α-1 α-2 α-3
α-1 α-23•2•3
kM
Partial Multiple-LCP
Detect t largest alignments between exactly k molecules.We are interested in above solutions for each k, 2 k m.
MultiProt
• Non-predefined Pattern detection.• Partial Solutions.• Time Efficient –
5 protein in 14 seconds20 proteins (~500 a.a.) in 10 minutes50 proteins (~200 a.a.) in 19 minutes[PentiumII 500MHz 512Mb memory]
/home/silly6/mol/demos/MultiProt/
Algorithm Features
•Assumption: any multiple alignment of proteins should align, at least short, contiguous fragments (minimum 3 points) of input points.
•Reduction of solution space: The aligned contiguous fragments are of maximal length.
•All (almost, because of ε-congruence) possible solutions (transformations) are detected (optimal solutions are ‘hard’ to select).
Input:Pivot Molecule: Mp (participates in all solutions)Set of Molecules: S`=S\{Mp }Error Threshold: ε
Multiple Alignment with Pivot
• Detect all possibly aligned fragments of maximal length between the input molecules (chance to detect subtle similarities). • Select solutions that give high scoring global structural similarity.• Iterate over all possible pivots, Mp = M1… Mm
Bio-Core Detection
Geom. + Bio. Constraints
Classification: •hydrophobic (Ala, Val, Ile, Leu, Met, Cys)•polar/charged (Ser, Thr, Pro, Asn, Gln, Lys, Arg, His, Asp, Glu)•aromatic (Phe, Tyr, Trp)•glycine (Gly)
Or any other scoring matrix!
Substrate assisted catalysis – application to G proteins
Substrate assisted catalysis – application to G proteins. Mickey Kosloff and Zvi Selinger, TRENDS in Biochemical Sciences Vol.26 No.3 March 2001 161
Aspects of Structural Comparison
•A large number of structures (hundreds) – Molecular Dynamics.•Structural flexibility – proteins are not rigid structures.•Structure representation –
C-alpha atoms are suitable for comparisons of folds. Detection of similar function requires different representation. This brings another problem – side chain flexibility.
•Sequence order in structural alignment. Detection of active sites might require different approach. Proteins with different folds might provide the same function.
•Statistical Significance•Measure of geometrical similarity (RMSD, bottleneck, …), biological scoring function.
Motivation• Prediction of biomolecular
recognition.
• Detection of drug binding ‘cavities’.
• Molecular Graphics.
Connolly’s MS algorithm
• A ‘water’ probe ball (1.4-1.8 A diameter) is rolled over the van der Waals surface.
• Smoothes the surface and bridges narrow ‘inaccessible’ crevices.
Connolly’s MS algorithm - cont.
• Convex, concave and saddle patches according to the no. of contact points between the surface atoms and the probe ball.
Outputs points+normals according to the required sampling density (e.g. 10 pts/A2).
Critical points based on Connolly rep. (Lin, Wolfson,
Nussinov)
• Define a single point+normal for each patch.
• Convex-caps, concave-pits, saddle - belt.
Geometric Docking Algorithms
• Based on the assumption of shape complementarity between the participating molecules.
• Molecular surface complementarity - protein-protein, protein-ligand, (protein - drug).
• Hydrogen donor/acceptor complementarity - protein-drug.
Remark : usually “protein” here can be replaced by “DNA” or “RNA” as well.
Issues to be examined when evaluating docking methods
•Rigid docking vs Flexible docking :– If the method allows flexibility:
•Is flexibility allowed for ligand only, receptor only or both ?•No. of flexible bonds allowed and the cost of adding additional
flexibility.
•Does the method require prior knowledge of the
active site ?•Performance in “unbound” docking experiments.•Speed - ability to explore large libraries.
General Algorithm outline
• Calculate the molecular surface of the receptor and the ligands and their interest points (+ normals).
• Match the interest points and recover candidate transformations.
• Check for inter-molecule and intra-molecule penetrations and score the amount of contact.
• Rank by geom-score/energies.