molecular similarity & molecular descriptors for drug design n. sukumar center for biotechnology...

50
Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Upload: robert-neal

Post on 21-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Molecular Similarity & Molecular

Descriptorsfor Drug DesignN. Sukumar

Center for Biotechnology & Interdisciplinary StudiesRensselaer Polytechnic Institute

Page 2: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

The Informatics Process: Sifting Sand

UNDERSTANDING

WISDOM

DATA

INFORMATION

KNOWLEDGE

Page 3: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Traditional Hypothesis Driven Research Paradigm

Hypothesis

Experiment

Data

Result

Design

Data analysis

Page 4: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Cheminformatics/Bioinformatics :

A Statement of the Problem Experiment

Assay Screening or Gene Data(the more data the better)

DataNo Prior Hypothesis

Page 5: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Structure-Activity Relationships

MOLECULARSTRUCTURE

CHEMICAL/BIOLOGICAL

ACTIVITY

MOLECULARDESCRIPTOR

REPRESENTATION

Statist

ical o

r Patt

ern

Recog

nition

Meth

ods

XComputational

Chemistry

Page 6: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Quantitative Structure Activity Relationship (QSAR) & Quantitative Structure Property Relationship (QSPR)

• The role of data mining in chemistry is to evaluate "hidden" information in a set of chemical data.

• A typical application is the retrieval of structures with defined biological activity (for drug development) from a database.

• Finding the adequate descriptor for the representation of chemical structures is one of the basic problems in chemical data mining.

• Molecules are normally represented as 2-D formulas or 3-D molecular models.

• While the 3-D coordinates of atoms in a molecule are sufficient to describe the spatial arrangement of atoms, they lack two features: – they are not independent on the size of a molecule; – they do not describe additional properties.

http://www.terena.nl/conferences/archive/tnc2000/proceedings/10B/10b5.html

Page 7: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Molecular Similarity

– “Similarity" can have quite different meanings in chemical approaches.

– Molecular Similarity does not just mean similarity of structural features.

– Similarity in a chemical context must include additional properties.

Page 8: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

It was six men of IndostanTo learning much inclined, Who went to see the Elephant

(Though all of them were blind), That each by observation

Might satisfy his mind

The First approached the Elephant, And happening to fall

Against his broad and sturdy side, At once began to bawl:

“God bless me! but the Elephant Is very like a wall!”

The Second, feeling of the tusk, Cried, “Ho! what have we here

So very round and smooth and sharp? To me ’tis mighty clear

This wonder of an Elephant Is very like a spear!”

The Third approached the animal, And happening to take

The squirming trunk within his hands,

Thus boldly up and spake: “I see,” quoth he, “the Elephant

Is very like a snake!”

The Fourth reached out an eager hand,

And felt about the knee. “What most this wondrous beast is like

Is mighty plain,” quoth he; “ ‘Tis clear enough the Elephant

Is very like a tree!”

The Fifth, who chanced to touch the ear,

Said: “E’en the blindest man Can tell what this resembles most;

Deny the fact who can This marvel of an Elephant

Is very like a fan!”

The Sixth no sooner had begun About the beast to grope,

Than, seizing on the swinging tail That fell within his scope,

“I see,” quoth he, “the Elephant Is very like a rope!”

And so these men of Indostan Disputed loud and long,

Each in his own opinion Exceeding stiff and strong,

Though each was partly in the right, And all were in the wrong!

- John Godfrey Saxe (1816-1887)

Page 9: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

musknon-musk

An example of An example of Classification: Classification:

Macrocycles – musky odor Macrocycles – musky odor or not?or not?

(C. Davidson and B. Lavine)(C. Davidson and B. Lavine)

• 139 compounds: 103 musks 36 non-musks.

• 264 molecular descriptors.

Page 10: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

musk non-musk

Nitroaromatic Musk Nitroaromatic Musk Candidates Candidates

(C. Davidson and B. Lavine)(C. Davidson and B. Lavine)

Page 11: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

GA/PCA Results with TAE GA/PCA Results with TAE descriptors descriptors

(C. Davidson and B. Lavine)(C. Davidson and B. Lavine) 7 selected features7 selected features

Page 12: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

-6 -4 -2 0 2 4 6-3

-2

-1

0

1

2

311111

11

1

22

111

2

111

222

11

111

11

2

111 11

2

1

2

1

22

1111

2

1

1

222

11

1

1

222

111

2222

1

1

2

11111111

2

1111

2

111

222

1

22222

111

22

1

2

1

2

11

22

1

2222

1

222222222

2

11

22222222

1

2 22

1

2

11

222222222

1

222222

1

22

1

2 22

1

2222222

2

1

22222

1

22

1

222

22

2

2

1

2

22

2

2

22

2

2

22

2

2 2

2

2

222

222

2

2222

2

2

222

2

222

2

22

22

2

2

2 2

2

2

2

2

22

2 2

2

2

22

22

2

2

2

222

22

22

22

22

2

22

2

2

2

2

2

22

111

11 111 1111111 1

111

1

1

1111

1

1

1111

1

11111

111

1 111

1111

11

1

11111

11

PC

2

3D PC Plot Dim(30)

PC1

•1 Macro Non-Musk

•2 Macro Musk

•1 Nitro Non-Musk

•2 Nitro Musk

Nitroaromatics and MacrocyclesNitroaromatics and Macrocycles

Results with PEST Results with PEST DescriptorsDescriptors

(C. Davidson and B. Lavine)(C. Davidson and B. Lavine)

DGNAVGN, DGNH7, DGNW6, DGNW19, DGNW22, DGNB05, DGNB14, DGNB22, DGNB33, DKNAVGN, DKNH3, DKNW4, DKNW6, DKNB00,

DKNB24, DRNH4, DRNW3, DRNW5, DRNW15, DRNW28, GW16, GW21, GW28, KW11, KW27, FUKW21, PIPB14, PIPB30, BNPW27, BNPB44

Page 13: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

ADMET Property Prediction: Challenges in Medicinal

Chemistry

• Other parameters: patent position, chemical synthesis• The greatest hurdle : ADMET properties.

Multiple-parameter optimization of lead structures

Page 14: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Different barriers

DrugsDrugs

Mucus Gel Layer

Intestinal Epithelial Cells

Lamina Propria

Endothelium of Capillarics

A series of separate barriers (epithelial layer is the most dominant barrier)

Be absorbedBe absorbed

Page 15: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Motivation• Introduction of a new drug into the market is often the

culmination of a long and arduous process of laboratory experimentation, lead compound discovery, animal testing and pre-clinical and clinical trials.

• This process, from hit to lead to marketable drug, is typically as long as 10-15 years

• In silico drug discovery:– find a correlation between molecular structure and biological activity– now any number of compounds, including those not yet synthesized,

can be virtually screened on the computer to select structures with the desired properties.

• Virtual ADME/Toxicological screening can weed out compounds with adverse side effects, identifying the “losers” early on in the game.

• The most promising compounds can then be chosen for laboratory synthesis and pre-clinical testing– conserving resources cheaper medicines– accelerating the process of drug discovery.

Page 16: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Traditional Drug Discovery Scheme

Potency

Absorption

Distribution

Metabolism

ExcretionToxicit

y

Lead

Drug

Page 17: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

In silico prediction of ADME properties

Potency

Absorption

Distribution

Metabolism

ExcretionToxicit

y

Lead

Drug

Page 18: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Computational ADME-Tox modelsfor drug discovery

• Solubility• Absorption• Mutagenicity• Bioavailability• Metabolic stability• Blood-brain barrier permeability• Cardiac toxicity (hERG)• Plasma protein binding

Page 19: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

The figure depicts a cartoon representation of the relationship between the continuum of chemical space (light blue) and the discrete areas of chemical space that are occupied by compounds with specific affinity for biological molecules. Examples of such molecules are those from major gene families (shown in brown, with specific gene families colour-coded as proteases (purple), lipophilic GPCRs (blue) and kinases (red)). The independent intersection of compounds with drug-like properties, that is those in a region of chemical space defined by the possession of absorption, distribution, metabolism and excretion properties consistent with orally administered drugs — ADME space — is shown in green.

Christopher Lipinski & Andrew Hopkins, NATURE|VOL 432 | 16 DECEMBER 2004, pp.855-861

Page 20: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

O

H3C

N

N

CH3

N

CH3

Descriptors from Molecular Electronic Properties

Page 21: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

O

H3C

N

N

CH3

N

CH3

Molecular Representations

Page 22: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Linear Free Energy Relationships• Originally developed by Hammett, then by Taft • Intended to purely quantify the effect of substituents

and leaving groups on ester hydrolysis• Demonstrated the usefulness of parametric

procedures in describing an empirical property (equilibrium constant, rate constant) in terms of a parameter describing molecular structure.

• This relationship provides the thermodynamic basis for most implementations of QSAR by the relations:

http://www.netsci.org/Science/Compchem/feature08.html

Page 23: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Quantitative Structure-Activity Relationships (QSAR)

• QSAR was a natural extension of the LFER approach, with a biological activity correlated against a series of parameters that described the structure of a molecule.

• The most well known and most used descriptor in QSAR has been the LOG (Octanol/Water) partition coefficient (usually referred to as LOG P or LOG P[o/w]). LOG P has been very useful in correlating a wide range of activities due to its excellent modeling of the transport across the blood/brain barrier.

• Unfortunately, many regressions do not work well for LOG P, usually because other effects are important, such as steric and electronic effects.

• Therefore, many other descriptors have been used in QSAR in addition to LOG P to incorporate these additional effects.

Page 24: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

• “2-D” Molecular Descriptors can be calculated from the connection table (with no dependence on conformation):– Physical Properties

– Subdivided Surface Area Descriptors

– Atom Counts and Bond Counts

– Connectivity and Shape Indices

– Adjacency and Distance Matrix Descriptors

– Pharmacophore Feature Descriptors

– Partial Charge Descriptors

• “3-D” Descriptors depend on molecular coordinates:– Potential Energy Descriptors

– Surface Area, Volume and Shape Descriptors

– Conformation Dependent Charge Descriptors

MOE Descriptors® Chemical Computing Group Inc.

• Sum of the atomic polarizabilities• Molecular mass density• Total charge of the molecule• Molecular refractivity • Molecular weight.• Log of the octanol/water partition

coefficient

•Number of aromatic atoms•Number of atoms•Number of heavy atoms•Number of hydrogen atoms •Number of boron atoms•Number of carbon atoms•Number of nitrogen atoms•Number of oxygen atoms•Number of fluorine atoms•Number of phosphorus atoms•Number of sulfur atoms•Number of chlorine atoms•Number of bromine atoms•Number of iodine atoms•Number of rotatable single bonds •Number of aromatic bonds •Number of bonds •Number of double bonds •Number of rotatable bonds •Fraction of rotatable bonds•Number of single bonds•Number of triple bonds•Number of chiral centers •Number of O and N atoms•Number of OH and NH groups •Number of rings

•Water accessible surface area of all atoms with positive partial charge •Water accessible surface area of all atoms with negative partial charge •Water accessible surface area of all hydrophobic atoms•Water accessible surface area of all polar atoms •Positive charge weighted surface area•Negative charge weighted surface area

•Water accessible surface area•Globularity•Principal moment of inertia•Radius of gyration•van der Waals surface area

•Angle bend potential energy•Electrostatic component of the potential energy•Out-of-plane potential energy•Solvation energy•Bond stretch potential energy•Local strain energy•Torsion potential energy

•Number of hydrogen bond acceptor atoms•Number of acidic atoms•Number of basic atoms•Number of hydrogen bond donor atoms•Number of hydrophobic atoms

•Total positive partial charge•Total negative partial charge•Total positive van der Waals surface area•Total negative van der Waals surface area•Fractional positive polar van der Waals surface area•Fractional negative polar van der Waals surface area

Page 25: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Some Topological Descriptors• Wiener number W is the total distance between all carbon

atoms (sum of the distances between each pair of carbon atoms in the molecule, in terms of carbon-carbon bonds).

• The smaller this number, the larger is the compactness of the molecule.

• Method of calculation: Multiply the number of carbon atoms on one side of any bond by those on the other side; W is the sum of these two values for all bonds.

• W can also be obtained by simply adding all the elements of the graph distance matrix above the main diagonal.

• Hosoya topological index Z is obtained by counting the k disjoint edges in a graph (for k = 0, 1, 2, 3, ...).

• Z counts all sets of non-adjacent bonds in a structure.

Page 26: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Wiener number W, Hosoya index Z and connectivity index

• Connectivity index(Milan Randic, A.T. Balaban)

= (RiRj)-1/2

is constructed from the row sums Ri and Rj of the adjacency matrix using the algorithm (RiRj)-1/2 for the contribution of each bond (i,j)

is a bond additive quantity where terminal CC bonds are given greater weight than inner CC bonds.

Page 27: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

– The wave function given by solution of the Schrödinger equation H = E contains all information about the molecule.

– “All science is either physics or stamp collecting” — Ernest Rutherford (Nobel Prize in Chemistry, 1908)

– BUT: (r1, r2, r3, …) is a function of the coordinates of all the electrons (and nuclei) in the molecule!

– “The fundamental laws necessary for the mathematical treatment of a large part of physics and the whole of chemistry are thus completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved.” — Paul Dirac (1902 - 1984)

Quantum chemical Electron Density Derived

descriptors

Page 28: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Hohenberg-Kohn theorem (Density Functional Theory)

– The electron density (r)

(r) = *(r1, r2, r3, …)(r1, r2, r3, …)dr2dr3…

contains all information about the ground state. (r) is a function of only (x,y,z)

– BUT: the electron density (r) is an not a very sensitive descriptor of chemistry ( “near-sightedness of the electron density”)

• Disadvantage: Difficult to use (r) directly as descriptor

• Advantage: Can use to simplify descriptor computations:TAE-RECON method

Page 29: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Electron Density DerivedMolecular Surface Properties

– Electrostatic Potential

– Electronic Kinetic Energy Density

– Electron Density Gradients •N

– Laplacian of the Electron Density

– Local Average Ionization Potential

– Bare Nuclear Potential (BNP) first term of EP

– Fukui function F+(r) = HOMO(r)

K(r) ( *2 2*)

G(r ) * .

EP(r) Z

r R

(r' )dr'

r r'

L(r) 2(r) K(r) G(r)

PIP(r) i(r) i

(r)i

Page 30: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Reconstruction Method

Algorithm for rapid reconstruction of molecular charge densities and molecular electronic properties

Based on topological quantum theory of Atoms In Molecules Employs a library of atomic charge density fragments corresponding to

structurally distinct atom types Associated with each atomic charge density fragment in the library is a

data file which contains atomic charge density-based descriptors encoding electronic and structural information relevant to the chemistry of intermolecular interactions.

http://www.drugmining.com/

Page 31: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Topological Theory ofAtoms in Molecules

Definition of an Atom in a Molecule:An atom is the union of an attractor and its basinEach atom contains one (and only one) nucleus, which is

the attractor of its electron density distribution (r)Every atom is bounded by an atomic surface of zero flux

Atoms defined in this way satisfy the virial theoremThey have properties that are approximately additive

and transferable from one molecule to another.

0ˆ. n

Page 32: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

For each atom in the molecule, determine atom types and assign closest match from atom type library

Combine densities of atomic fragments

Compute predicted molecular properties

Reconstruction Method

http://www.drugmining.com/

Page 33: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Surface Property Distribution Histogram (TAE) Descriptors

Surface histograms can represent property distributions with 80-85% accuracy when 10-20 histogram bins are used.

PIP (Local Ionization Potential)surface property for a member ofthe Lombardo blood-brain barrierdataset.

Page 34: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Molecular Surface Properties:Wavelet Coefficient Descriptors (WCD)

Wavelet Surface Property Reconstruction:

16 coefficients from S7 and D7 portions of the WCD vector represent surface property densities with >95% accuracy.

1024 raw wavelet coefficients capture PIP distribution on molecular surface.

Wavelet Decomposition:– Creates a set of

coefficients that represent a waveform.

– Small coefficients may be omitted to compress data.

Page 35: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

• A TAE property-encoded surface is subjected to internal ray reflection analysis.

• A ray is initialized with a random location and direction within the molecular surface and reflected throughout inside the electron density isosurface until the molecular surface is adequately sampled.

• Molecular shape information is obtained by recording the ray-path information, including segment lengths, reflection angles and property values at each point of incidence.

PEST Shape/Property Hybrid descriptors

Isosurface (portion removed) with 750 segments

Page 36: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

PEST Hybrid Shape/Property Descriptors

• Surface properties and shape information are encoded into alignment-free descriptors

PIP vs Segment Length

• Segment length and point-of-incidence value form 2D-histogram

• Each bin of 2D-histogram becomes a hybrid descriptor

Page 37: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

PEST Property-Encoded RaysRay-tracing algorithmconverges quicklyand provides good coverage of internalvolume of molecules

Morphine – electronickinetic energy density

Zoomed graphics (l-r)

Page 38: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

PEST Property-Encoded Rays

Page 39: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Property-Encoded Surface Translation:Shape/Property Hybrid Distribution: EP

Morphine

Page 40: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Morphine

Property-Encoded Surface Translation:Shape/Property Hybrid Distribution: BNP

Page 41: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Tessellated Protein SurfaceTessellated Protein Surfaceusing Delaunay Tessellation for Surface Definitionusing Delaunay Tessellation for Surface Definition

Page 42: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Sliced Surface For 1A42

Page 43: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Protein Pest (PPEST) Descriptors using MOE Surface as locus for TAE surface properties

Page 44: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Protein “PEST” Descriptorsfor Hydrophobic Interaction Chromatography

1BL

F (

lact

ofer

rin)

135L

(ly

sozy

me)

MLP2 surface 1BLF MLP2 1BLF EP

135L MLP2 135L EPMLP2 surface

Page 45: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Hierarchical Structure of Proteins

1. Primary— linear sequence

2. Secondary— local, repetitive spatial

arrangements

3. Tertiary— 3-D structure of native

fold

4. Quaternary— non-covalent

oligomerization of subunits (single polypeptides) into protein complexes 

REENVYMAKLAEQAERYEEMVEFMEKVSNSLGSEELTVEERNLLSVAYKNVIGARRASWRIISSIEQKEESRGNEEHVNSIREYRSKIENELSKICDGILKLLDAKLIPSAASGDSKVFYLKMKGDYHRYLAEFKTGAERKEAAESTLTAYKAAQDIATTELAPTHPIRLGLALNFSVFYYEILNSPDRACNLAKQAFDEAIAELDTLGEESYKDSTLIMQLLRDNLTLWTSDMQDDGADEIKE

Page 46: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

In a polypeptide the main chain N-C and C-C bonds relatively are free to rotate. These rotations are represented by the torsion angles and , respectively.G. N. Ramachandran used computer models of small polypeptides to systematically vary and with the objective of finding stable conformations.

RamachandranMap

Page 47: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Copyright ©2005 by the National Academy of Sciences

Sims, Gregory E. et al. (2005) Proc. Natl. Acad. Sci. USA 102, 618-621

Higher order - maps and representative conformations

Page 48: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Protein fingerprint — Mihaly Mezei

FP0ij= sign {[r(Oi)-r(Ci)] . [r(Cj)-r(Ci)]}

FP1ij= sign {[r(Ni)-r(Ci)] . [r(Cj)-r(Ci)]}

Page 49: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

QSAR assumptions

The properties of a chemical are implicit in its molecular structure

Molecular structure can be measured and represented with a set of numbers (descriptors or other numerical representation)

Compounds with similar structure exhibit similar properties; compounds with dissimilar structure exhibit dissimilar properties

— What about effects of the environment?All other factors should be held constant in assay;

Don’t compare apples to oranges.

— But which set of numbers?What descriptors to use?

Feature Selection.

— Similar in what way?

Page 50: Molecular Similarity & Molecular Descriptors for Drug Design N. Sukumar Center for Biotechnology & Interdisciplinary Studies Rensselaer Polytechnic Institute

Machine Learning Methods

StatisticsStatistics? ?

“If your experiment needs statistics, you ought to have done a better experiment”

- Ernest Rutherford