binding site analysis: applications in pharma research · identifying and characterizing protein...
TRANSCRIPT
Binding site analysis: Applications in
pharma research
28 June 2011, TU München
Andrea Schafferhans
Types of protein similarity
• Function • Sequence
– Paralogs – within species
– Orthologs – across species
• Binding sites / interaction patterns
20 January 2011 2 Introduction
Similar proteins have similar interaction partners
(?)
20 January 2011 Introduction 3
Evidence: Analysing target relationships
Nodes: proteins Edges: similar binding
(within factor 103)
20 January 2011 4
Paolini,G.V. et al. (2006) Global mapping of pharmacological space. Nature biotechnology, 24, 805-15.
Introduction
Evidence (2): Analysing target relationships
20 January 2011 5
Paolini,G.V. et al. (2006) Global mapping of pharmacological space. Nature biotechnology, 24, 805-15.
Introduction
Applications
• Function prediction • Drug development
– “Target Class” approach – Side effects – “Polypharmacology” / “Network pharmacology”
20 January 2011 Introduction 6
Hopkins,A.L. (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol, 4, 682-690.
Contents
1. Introduction 2. Protein comparison
– Computational binding site identification – Binding site comparison
3. Application examples
20 January 2011 Introduction 7
What is a binding site?
• Function – Binding other proteins (e.g. signal transduction) – Binding substrates (enzymes) – Binding Co-Factors (e.g. Heme) – …
• Form – Cavity in the protein – CAVE: induced fit / conformational selection more realistic
• Pragmatic – Around all HETATM records in PDB (CAVE: e.g. metals…)
20 January 2011 Binding site identification 8
Binding site characteristics
• Usually a pocket or cleft in the protein • Less hydrophobic than the interior of a protein • Specific through complementarity of
– Form – Electrostatic interactions – Hydrogen bonds – Hydrophobic interactions
Henrich S, Salo-Ahen OM, Huang B, et al.: Computational approaches to
identifying and characterizing protein binding sites for ligand design. Journal of Molecular Recognition 2010, 23:209-219
20 January 2011 9 Binding site identification
Binding site analysis – Applications
• Automated drug target annotation – E.g. estimation of druggability
(binding site size, hydrophobicity, etc.)
• Virtual screening – Restrict the search space for docking experiments
• Function prediction • Prediction of drug side effects
20 January 2011 10 Binding site identification
Finding binding sites – geometrically
Observation: Binding sites usually are the largest pockets
e.g. 83% of enzyme active sites found in the largest pocket
(Laskowski RA, et al. Protein clefts in molecular recognition and function. Protein Sci. 1996; 5:2438-2452.)
20 January 2011 11 Binding site identification
• Fill the protein with a grid (3 Å spacing) • Mark grid points as “protein“
(within 3 Å of an atom ) or “solvent“ • Go along grid and mark “solvent” points
that lie between “protein” points for potential pocket • Find largest “clusters” of “pocket” points Levitt D, Banaszak L. POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J. Mol. Graph 1992, 10:229-234.
20 January 2011 12 Binding site identification
LIGSITE
Differences to POCKET • More efficient searching for
neighbour atoms • Cubic diagonals also used for
finding pockets less dependent on orientation
• Grid points scored by the number of times they are found (between 0 and 7) adjustable “buriedness“
• Smaller and adjustable grid spacing (best: 0.5 to 0.75 Å) Hendlich M, et al.: LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graph. Mod. 1997, 15:359-363
20 January 2011 13 Binding site identification
Finding binding sites – energetically
Binding sites interact with the bound molecules Find location of favourable interaction energies
20 January 2011 14 Binding site identification
GRID
• Calculates interaction energies of probe molecules • Uses three terms:
– Lennard-Jones (attraction + repulsion) – electrostatic – directional hydrogen bond
Goodford, P.J. A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Med. Chem. 1985 28:849-857
20 January 2011 15 Binding site identification
GRID application
• Cluster energy minima binding site • BUT:
– Hard to cluster – Computationally intensive
• Good for binding site characterisation
Picture from: Henrich S, Salo-Ahen OM, Huang B, et al. JMR 2010, 23:209-19.
20 January 2011 16 Binding site identification
Q-SiteFinder
• GRID methyl probe (0.9 Å grid) • Cluster:
adjacent grid points that meet energy criterion
→ Success: > 70% first predicted binding site > 90% first three
→ 68% average precision (precision: overlap between ligand
and predicted binding site) Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21:1908-16
20 January 2011 17 Binding site identification
i-Site
20 January 2011 18
Variation of Q-Site: • Better probe distribution
(more dense grid) • Two energy limits
– low value for cluster seeds – higher value for extension filtering out meaningful clusters
• AMBER force field Morita M, Nakamura S, Shimizu K: Highly accurate method for ligand-binding site prediction in unbound state (apo) protein structures. Proteins 2008, 73:468-479
Binding site identification
Challenges in binding site identification
• Protein flexibility can “hide” binding sites → Use multiple experimental conformations → Use molecular dynamics to generate conformations
• Dimerisation has to be considered → Carefully look at PDB unit cell → Carefully look at information about the protein
20 January 2011 19 Binding site identification
Characterising binding sites
Properties to characterise: • Geometry • Amino acid composition • Solvation • Hydrophobicity • Electrostatics • Interactions with functional groups
20 January 2011 Binding site comparison 20
Hydrophobicity
Measured by logP (partitioning between water and octanol) • Map atom / residue based
contributions • Calculate interaction
energies of hydrophobic probes (e.g. GRID)
20 January 2011 21 Binding site comparison
Electrostatics
• Map electrostatic potential onto surface (e.g. using DelPhi, see http://structure.usc.edu/howto/delphi-surface-pymol.html)
• CAVE: dependence on protonation!
20 January 2011 22 Binding site comparison
Functional groups
• Superstar – Analyse the spatial distribution of
functional groups in CSD density maps
– Break the protein into fragments found in CSD
– Map the observed distribution of interaction partners onto the protein
Verdonk ML, Cole JC, Taylor R: SuperStar: a knowledge-based approach for identifying interaction sites in proteins. Journal of molecular biology 1999, 289:1093-108.
20 January 2011 23 Binding site comparison
Binding site comparison
• Align structures in 3D • Analyse differences and similarities of
– Amino acid composition – Local conformation – Pocket size – Presence of interaction
partners
• Straightforward in case of – Sequence similarity or – Structural similarity
20 January 2011 24 Binding site comparison
RELIBASE
20 January 2011 25 Binding site comparison
RELIBASE
• Stores binding sites from PDB structures • Allows superposition of related binding sites • Computes differences between binding sites Hendlich M, Bergner A, Günther J, Klebe G: Relibase: Design and Development of a Database for Comprehensive Analysis of Protein-Ligand Interactions. Journal of Molecular Biology 2003, 326:607-620. http://relibase.ccdc.cam.ac
20 January 2011 26 Binding site comparison
• cAMP-dependent protein kinase (1cdk) with adenyl-imido-triphosphate
• trypanothione reductase (1aog) with flavine-adenine-dinucleotide
20 January 2011 27
Similar but not homologous binding sites
Binding site comparison
20 January 2011 28
Similar but not homologous binding sites
Graphics from www.ebi.ac.uk/pdbsum/
Binding site comparison
20 January 2011 29
Similar but not homologous binding sites
Graphics from Schmitt S, Kuhn D, Klebe G. Journal of molecular biology 2002, 323:387-406
Binding site comparison
Problems in binding site comparison
• Automatically locate binding site • Capture important features in efficient representation • Search efficiently across all structures
– Find best superimposition – Score the alignment
20 January 2011 30 Binding site comparison
Binding site comparison methods • Representation by
– Coordinate set with physico-chemical or evolutionary properties • Atoms • Chemical groups • Surface points
– 3D shape descriptors • Superimposition by
– Geometric hashing – Graph theory, clique search
• Similarity measurement by – RMSD – Residue conservation – Physico-chemical property similarity
20 January 2011 31 Binding site comparison
CavBase – Structure representation • Cavity detection with LIGSITE (stored in Relibase)
• Cavity-flanking residues represented as pseudo-centers: – Donor – Acceptor – Donor-Acceptor – Aliphatic – PI – several per residue if necessary
• Create Graph: – Nodes: pseudo-centers – Edges: distances between the pseudo-centres
Graphics from Schmitt S, Kuhn D, Klebe G. Journal of molecular biology 2002, 323:387-406
20 January 2011 32 Binding site comparison
CavBase – Alignment Create associated graph:"
Node: ""node from protein A and node from protein B with similar interaction properties"
Edge:""member nodes in protein A and B are connected member node distance <12Å distance difference <2Å
Find maximal common subgraph (Bron-Kerbosh) similar arrangement of pseudo-centers in original graphs 20 January 2011 33 Binding site comparison
CavBase – Scoring • Scoring based on
overlap of similarly typed surface patches
Kuhn D, Weskamp N, Schmitt S, Hüllermeier E, Klebe G: From the Similarity Analysis of Protein Cavities to the Functional Classification of Protein Families Using Cavbase. Journal of Molecular Biology 2006, 359:1023-1044
20 January 2011 34 Binding site comparison
SOIPPA – Structure representation
• Delaunay tesselation of Cα atoms -> 1 tetrahedron/Cα
• Environmental boundary (red) and protein boundary (blue)
Bourne PE, Xie L: A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites. BMC Bioinformatics 2007, 8:S9. Bourne PE, Xie L: A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics 2009, 25:i305-312.
20 January 2011 35 Binding site comparison
SOIPPA – Structure representation (2)
• Each Cα characterized by – Vector with distance and direction
of boundaries – Substitution matrix
• Graph: Node: Cα Edge: connection of tetrahedra
Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.
20 January 2011 36 Binding site comparison
SOIPPA - Alignment Create associated graph:"
Node: ""node(A) + node(B) with similar geometric potential ""weight: amino acid frequency profile similarity"
Edge:""member nodes in protein A and B are connected""distance difference <2Å surface normal difference <30°
Find maximum-weight common subgraph (MWCS)
Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.
20 January 2011 37 Binding site comparison
SOIPPA – Scoring • Sum over aligned residue pairs:
Residue similarity "weighted by distance
and normal vector angle
• Statistical significance of score Background score distribution: – compare unrelated structures with random sequences – fit resulting score distribution to extreme value distribution function giving probability of randomness dependent on score
!
Sij = (Mij " paij " pdij )i, j#
Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.
20 January 2011 38 Binding site comparison
Isocleft • Structure representation: Cα / atoms within 5 Å of ligand • Alignment: Bron-Kerbosh of associated graph
• Scoring:
Najmanovich R, Kurbatova N, Thornton J: Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics 2008, 24:i105 http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/icfdb/StartPage.pl
!
S =NC
NA + NB " NC
20 January 2011 39 Binding site comparison
Isocleft - innovations • Two iterations of alignment:
1. Nodes: Cα atoms, Edges: distance difference <3.5 Å, minimal residue similarity Superimpose based on found graph
2. Nodes: all heavy atoms, Edges: distance <4 Å, similar atom type (hydrophilic, acceptor, donor, hydrophobic, aromatic, neutral, neutral-donor and neutral-acceptor)
• Use first result of Bron-Kerbosch, then terminate
Najmanovich R, Kurbatova N, Thornton J: Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics 2008, 24:i105
20 January 2011 40 Binding site comparison
Example 1: Explaining side effects
Problem: side effects of ERα modulators (SERMs)
Finding “off target” effects: • Map sequences to structures (BLAST) • Limit to “druggable” proteins (?) • Search with SOIPPA => SERCA (SarcoplasmicReticulum
Ca2+ channel ATPase)
20 January 2011 Application examples 41
Xie L, Wang J, Bourne PE (2007) In silico elucidation of the molecular mechanism defining the adverse effect of selective estrogen receptor modulators. PLoS Comput Biol 3(11)
Example 1: Validating results
• Inverse search
• Docking – SERM – similar compounds, correlate (?)
20 January 2011 Application examples 42
Example 2: Repositioning known drug
Problem: new tuberculosis drugs needed, but many parameters to optimise
Finding compound to reuse against InhA: • Search other structures binding Adenine
(ATP, ADP, NAD, FAD, ...) • Compare binding sites with SOIPPA => SAM-dependent methyltransferases
20 January 2011 Application examples 43
Kinnings SL, Liu N, Buchmeier N, Tonge PJ, Xie L, et al. (2009) Drug Discovery Using Chemical Systems Biology: Repositioning the Safe Medicine Comtan to Treat Multi-Drug and Extensively Drug Resistant Tuberculosis. PLoS Comput Biol 5(7)
Example 2: Structure match
catechol-O-methyltransferase (COMT), SAM, inhibitor InhA, NAD, ligand
20 January 2011 Application examples 44
Summary
Pharma research focus moving from only individual interactions to system oriented research
Challenges: • How to compare? • Computational overhead
20 January 2011 Summary 45