alignment of flexible molecular structures

Alignment of Flexible Molecular Structures

Motivation

• Proteins are flexible. One would like to align proteins modulo the flexibility.

• Hinge and shear protein domain motions (Gerstein, Lesk , Chotia).

• Conformational flexibility in drugs.

http://molmovdb.mbb.yale.edu/MolMovDB/

Motivation

Flexible protein alignment without prior hinge knowledge

FlexProt - algorithm

– detects automatically flexibility regions

– exploits amino acid sequence order

Examples

Experimental Results

• Task: largest flexible alignment by largest flexible alignment by decomposing the two molecules into a decomposing the two molecules into a minimalminimal number of rigid fragment pairs number of rigid fragment pairs having similar 3-D structure.having similar 3-D structure.

Detection of Congruent Detection of Congruent Rigid Fragment PairsRigid Fragment Pairs

Joining Rigid Joining Rigid Fragment PairsFragment Pairs

Rigid Rigid Structural ComparisonStructural Comparison

ClusteringClustering(removing ins/dels)(removing ins/dels)

FlexProt Main Steps

Structural Similarity Matrix

Congruent Rigid Fragment Pair

j

i+1

j+1i

j-1

i-1

vi-1 vi vi+1

wj-1 wj wj+1

Fragkt(l) = vk … vi ... vk+l-1

wt … wj … wt+l-1

RMSD (Fragkt(l) ) <

Detection of Congruent Rigid Fragment Pairs

k

t

k+l-1t+l-1





FlexProt Main Steps

How to Join Rigid Fragment Pairs ?

Graph Representation

Graph NodeGraph Node

Graph EdgeGraph Edge


•The fragments are in ascending order.The fragments are in ascending order.

•The gaps (ins/dels) are limited.The gaps (ins/dels) are limited.

•Allow some overlapping.Allow some overlapping.

W

+ Size of the rigid fragment pair (node b)

- Gaps (ins/dels)

- OverlappingPenalties

a b


W_i

W_k

W_t

W_mW_n

• DAG DAG (directed acyclic graph)(directed acyclic graph)

W_i

W_k

W_t

W_mW_n

•““Single-source shortest paths”Single-source shortest paths” O(|E|+|V|) O(|E|+|V|)





FlexProt Main Steps

Clustering (removing ins/dels)

T1

T2

If joining two fragment pairs gives small RMSD (T1 ~ T2) then put them into one cluster.





FlexProt Main Steps

Multiple Structural Alignment

Multiple Structural Alignment Schemes

•Linear progressive. Starts with one object and successively compares the other objects to the results.•Tree progressive. The alignment is created according to a similarity tree. The alignment direction is from the leaves to the tree root.

•Gerstein and Levitt 1998. •Orengo and Taylor 1994. SSAPm method.•Sali and Blundell 1990•Russell and Barton 1992•Ding et al. 1994


•Pivot. Uses one object as the pivot and compares it to all other objects. The results are then analyzed to find the common similarities.

•Leibowitz, Fligelman, Nussinov, and Wolfson 1999. Geometric Hashing technique.•Escalier, Pothier, Soldano, Viari 1998. Exploits all common substructures.


•Optimization Techniques.

•Guda, Scheeff, Bourne, Shindyalov. Monte Carlo optimization.

http://cl.sdsc.edu/mc/mc.html








Previous Work – Multiple Structural Alignment

Disadvantages:•Most methods do not detect partial solutions.•The methods which detect partial solutions are not efficient for a large number of molecules.

Partial Solutions

A

A

AB

B

B is harder to detect than A

• Detection of local similarities.• Detection of subset of

molecules that share some local structural pattern.

Largest Common Point Set (LCP)

• Multiple-LCP is NP-hard even in one dimensional space for the case of exact congruence (Akutsu 2000).

• 3-D + ε-congruence more complex problem

Given two point sets detect the largest common sub-set.[exact congruence or ε-congruence]

Solution Space•The number of solutions, which answer the minimal criteria, could be exponential.

α-1 α-2 α-3

α-1 α-2 α-3

α-1 α-23•2•3

kM

Partial Multiple-LCP

Detect t largest alignments between exactly k molecules.We are interested in above solutions for each k, 2 k m.

MultiProt

• Non-predefined Pattern detection.• Partial Solutions.• Time Efficient –

5 protein in 14 seconds20 proteins (~500 a.a.) in 10 minutes50 proteins (~200 a.a.) in 19 minutes[PentiumII 500MHz 512Mb memory]

/home/silly6/mol/demos/MultiProt/

α-1

α-2

α-3

α-1

α-2

α-1

α-2

α-3

α-1

α-2

α-1

α-2

α-3

α-1

α-2

α-1 α-2 α-3

Algorithm Features

•Assumption: any multiple alignment of proteins should align, at least short, contiguous fragments (minimum 3 points) of input points.

•Reduction of solution space: The aligned contiguous fragments are of maximal length.

•All (almost, because of ε-congruence) possible solutions (transformations) are detected (optimal solutions are ‘hard’ to select).

Input:Pivot Molecule: Mp (participates in all solutions)Set of Molecules: S`=S\{Mp }Error Threshold: ε

Multiple Alignment with Pivot

• Detect all possibly aligned fragments of maximal length between the input molecules (chance to detect subtle similarities). • Select solutions that give high scoring global structural similarity.• Iterate over all possible pivots, Mp = M1… Mm

Bio-Core Detection

Geom. + Bio. Constraints

Classification: •hydrophobic (Ala, Val, Ile, Leu, Met, Cys)•polar/charged (Ser, Thr, Pro, Asn, Gln, Lys, Arg, His, Asp, Glu)•aromatic (Phe, Tyr, Trp)•glycine (Gly)

Or any other scoring matrix!

Experimental Results

Superhelix, 5 molecules.

Concavalin, 6 molecules.

Partial Solution Detection

1adj

1hc7

1qf6

1ati

AB

A

A

A

x

y z

B

B

B

Task to detect A and B

• Domain A ranked first (142 matched atoms)

• Domain B ranked eight’th (85 matched atoms)

4 proteins aligned based on detected domain A

Multiple Alignment of domain A

Multiple Alignment of domain A (enlarged)

4 proteins aligned based on domain B

Multiple Alignment of domain B

Multiple Alignment of domain B (enlarged)

Application to G proteins

A

A

B

Substrate assisted catalysis – application to G proteins

Substrate assisted catalysis – application to G proteins. Mickey Kosloff and Zvi Selinger, TRENDS in Biochemical Sciences Vol.26 No.3 March 2001 161

Aspects of Structural Comparison

•A large number of structures (hundreds) – Molecular Dynamics.•Structural flexibility – proteins are not rigid structures.•Structure representation –

C-alpha atoms are suitable for comparisons of folds. Detection of similar function requires different representation. This brings another problem – side chain flexibility.

•Sequence order in structural alignment. Detection of active sites might require different approach. Proteins with different folds might provide the same function.

•Statistical Significance•Measure of geometrical similarity (RMSD, bottleneck, …), biological scoring function.

Molecular Surface Representation

Applications to docking

Motivation• Prediction of biomolecular

recognition.

• Detection of drug binding ‘cavities’.

• Molecular Graphics.

Rasmol Spacefill display

1. Solvent Accessible Surface – SAS2. Connolly Surface

Connolly’s MS algorithm

• A ‘water’ probe ball (1.4-1.8 A diameter) is rolled over the van der Waals surface.

• Smoothes the surface and bridges narrow ‘inaccessible’ crevices.

Connolly’s MS algorithm - cont.

• Convex, concave and saddle patches according to the no. of contact points between the surface atoms and the probe ball.

Outputs points+normals according to the required sampling density (e.g. 10 pts/A2).

Example - the surface of crambin

Critical points based on Connolly rep. (Lin, Wolfson,

Nussinov)

• Define a single point+normal for each patch.

• Convex-caps, concave-pits, saddle - belt.

Critical point definition

Connolly => Shou Lin

Solid Angle local extrema

knob

hole

Chymotrypsin surface colored by solid angle (yellow-convex, blue-

concave)

Protein-protein and Protein-ligand Docking

The geometric filtering

Shape Complementarity

Geometric Docking Algorithms

• Based on the assumption of shape complementarity between the participating molecules.

• Molecular surface complementarity - protein-protein, protein-ligand, (protein - drug).

• Hydrogen donor/acceptor complementarity - protein-drug.

Remark : usually “protein” here can be replaced by “DNA” or “RNA” as well.

Issues to be examined when evaluating docking methods

•Rigid docking vs Flexible docking :– If the method allows flexibility:

•Is flexibility allowed for ligand only, receptor only or both ?•No. of flexible bonds allowed and the cost of adding additional

flexibility.

•Does the method require prior knowledge of the

active site ?•Performance in “unbound” docking experiments.•Speed - ability to explore large libraries.

General Algorithm outline

• Calculate the molecular surface of the receptor and the ligands and their interest points (+ normals).

• Match the interest points and recover candidate transformations.

• Check for inter-molecule and intra-molecule penetrations and score the amount of contact.

• Rank by geom-score/energies.

Shape feature and signature (Norel et al.)

Unbound docking examples

GGH based flexible docking

Applies either to flexible ligands or to flexible receptors.

Flexible DockingCalmodulin with M13 ligand

Flexible Docking HIV Protease Inhibitor

alignment of flexible molecular structures

Documents

largest flexible alignment

local structural pattern

alignment direction

detection of subset

common similarities

largest common subset

common substructures

similarity tree