molecular similarity and molecular structurereccr.chem.rpi.edu/...molecularstructure.pdf ·...

42
Molecular Similarity and Molecular Structure N. Sukumar ISPC, San Francisco, Aug. 2007

Upload: others

Post on 18-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Molecular Similarity and Molecular Structure

N. SukumarISPC, San Francisco, Aug. 2007

Why should molecules have Structure?Why should molecules have Structure?

“The idea that molecules are microscopic, material bodies with more or less well-defined shapes has been fundamental to the development of our understanding of the physicochemical properties of matter, and it is now so familiar and deeply ingrained in our thinking that it is usually taken for granted - it is the central dogma of chemistry.”

— R. G. Woolley (Woolley 1980)

Ø The notion that a molecule has structure is fundamental to much of chemistry as practiced today. But what do we really mean by thisterm?

Ø There are several ways to envision molecular structure, some more general and others fairly concrete, but rather restrictive.

Ø As an example of the latter, we can think of molecular structure in terms of familiar ball-and-stick molecular models. Such models are simple to visualize and are intuitively appealing. But by confining our conception to such models, we risk imposing a classical, mechanical, vision upon an intrinsically microscopic quantum world.

Ø From a philosophical perspective, we can define structure as that property of a molecule by virtue of which it occupies space in the real world.

Ø From a statistical perspective, we can define structure as that which distinguishes an object from a heap of its parts, in this case, a molecule from a collection of its constituent atoms.

What do we mean by What do we mean by Molecular Structure?Molecular Structure?

Ø This statistical definition generalizes the concept of molecular structure to situations where the relative spatial locations of the constituent atoms may not be known and makes the link to the fundamental statistics of the constituent particles.

Ø Most modern molecular structure determinations are indirect, utilizing a transformation from momentum space or frequency domain.

Ø Mathematically, structure is measured by the inter-particle distribution function. Thus an ideal gas of atoms has minimal structure, a hydrogen-bonded liquid is more structured and a crystal or molecular solid even more so.

Ø The familiar ball-and-stick molecular models are thus the rigid limit of a hierarchy of structures.

Molecular StructureMolecular Structure

Hierarchy of Molecular Structure RepresentationsHierarchy of Molecular Structure Representations

Molecular Structure andMolecular Structure andShannon information entropyShannon information entropy

Ø The Shannon information entropy is a maximum for a uniform distribution.

Ø Deviations from this uniformity may be attributed to structure.

Ø Electron-nuclear forces add structure to an electron distribution, thereby lowering the entropy;

Ø Electron repulsion forces broaden the distribution and hence raise the entropy

Ø A decrease of Shannon information entropy is due to the dominant role of the attractive forces exerted by the nuclei in imparting structure to the electron distribution in a molecular system

Molecular structure in theMolecular structure in theBornBorn--OppenheimerOppenheimer approximationapproximation• The BO separation of electronic and nuclear motions in molecules

shows that there must exist molecular states which can be approximately represented as products of electronic and nuclear functions.

• The electronic structure problem then involves solving for theeigenfunctions of an electronic Hamiltonian, while the nuclear function satisfies an equation of motion, with the eigenvalues of the electronic Hamiltonian forming an effective potential energy surface upon which the nuclei may be envisioned to move.

• The distinct concepts of electronic structure and molecular structure are thus intimately related.

• This is, of course, not accidental: as Hohenberg and Kohn showed in 1964, there exists a unique mapping between the potential v(r) due to the nuclei and the distribution of electron density ρ(r).

• Since ρ(r) determines the number of electronsN = ∫ρ(r) dr,

ρ(r) also uniquely determines the ground state wave function ψ, the ground state electronic energy and the molecular structure.

ρρ = = 0.36 e/Bohr0.36 e/Bohr3ρρ = = 0.002 e/Bohr0.002 e/Bohr3 ρρ = = 0.20 e/Bohr0.20 e/Bohr3

Electron density envelopes for Electron density envelopes for EthyleneEthylene

Electron density profiles of Electron density profiles of ethyleneethylene

Molecular structure and bond pathsMolecular structure and bond paths

“Will you reflect for a moment on some of the things that I have been saying? I described a bond, a normal simple chemical bond; and I gave many details of its character (and could have given many more). Sometimes it seems to me that a bond between two atoms has become so real, so tangible, so friendly that I can almost see it. And then I awake with a little shock; for a chemical bond is not a real thing: it does not exist: no one has ever seen it, no one ever can. It is a figment of our own imagination.”

— C. A. Coulson (Coulson 1951; Coulson 1955)

Molecular structure in the Quantum Molecular structure in the Quantum Theory of Atoms in MoleculesTheory of Atoms in Molecules• The virial partitioning of molecular systems into roughly neutral

subsystems forms the basis of the Quantum Theory of Atoms in Molecules, providing a rigorous and unambiguous recipe for partitioning a molecule into atomic subsystems.

• In this formulation, the nuclei function as attractors of the electron density field ρ(r), the atom being defined as the union of an attractor and its basin of attraction.

• Each atom thus contains one and only one nucleus, with the gradient paths of the electron density (∇ρ) being employed to define the bonds between atoms as well as the interatomic boundaries: the bond path between any two atoms is defined as the unique gradient path ∇ρ connecting the respective nuclei, while the interatomicsurface is defined through the zero-flux criterion:

∇ρ.ñ = 0where ñ is the normal to the surface.

Chemical topology & Molecular graphsChemical topology & Molecular graphs

• This partitioning scheme has a sound theoretical underpinning: the zero-flux criterion ensures that each atomic subsystem satisfies thevirial theorem and thereby ensures the spatial additivity of the action

W=∫L(t)dt(where L is the Lagrangian), and of its variation, in accordance withSchwinger’s principle of stationary action.

• It is through this principle that we are able to extend the formulation of quantum mechanics to an open quantum subsystem, such as an atom in a molecule.

• Through bond paths, we also recover the concept of chemical bonds: the topology of the bond paths completely specifies the molecular graph.

• This molecular graph is commonly referred to as the 2-D structure of the molecule.

http://www.chemistry.mcmaster.ca/faculty/bader/aim/aim_1.html

Electron density contours, Electron density contours, gradient paths and bond paths gradient paths and bond paths

of ethyleneof ethylene

Bond paths andBond paths andnonnon--nuclear attractors nuclear attractors

Li

Li Li

Li

Li2

Li4

Li6

No direct Li-Li bonds

Li Li

Li

Li

Li

Li

Li

Li = non-nuclear attractor

Umbiliccatastrophe

Water

Quantum Topology of Molecular Quantum Topology of Molecular Structure and ChangeStructure and Change

• Conformational flexibility is a critical link between structure, stability and function.

• Enzymes must be flexible enough to mediate a reaction pathway, yet rigid enough to achieve molecular recognition.

Structure and ConformationStructure and Conformation

Schonbrun, Jack and Dill, Ken A. (2003) Proc. Natl. Acad. Sci. USA 100, 12678-12682

Transition-state theory involves a rate-limiting step, shown as an obligatory thermodynamic barrier

Theory and simulations show that energy landscapes for protein folding are funnel-shaped and have no apparent microscopic energetic or entropic barriers.

Protein folding landscapeProtein folding landscape

Descriptors Model Property

NN

Cl

O

AAACCTCATAGGAAGCATACCAGGAATTACATCA…

MolecularStructures

Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

Encoding Structure : DescriptorsEncoding Structure : Descriptors

Molecular RepresentationsMolecular Representations

OH3C

NN

CH3

N

CH3

Chemistry space and Molecular SimilarityChemistry space and Molecular Similarity

The figure depicts a cartoon representation of the relationship between the continuum of chemical space (light blue) and the discrete areas of chemical space that are occupied by compounds with specific affinity for biological molecules. Examples of such molecules are those from major gene families (shown in brown, with specific gene families colour-coded as proteases (purple), lipophilic GPCRs (blue) and kinases (red)). The independent intersection of compounds with drug-like properties, that is those in a region of chemical space defined by the possession of absorption, distribution, metabolism and excretion properties consistent with orally administered drugs — ADME space — is shown in green.

Christopher Lipinski & Andrew Hopkins, NATURE|VOL 432 | 16 DECEMBER 2004, pp.855-861

Chemistry space and Molecular SimilarityChemistry space and Molecular Similarity

Molecular Similarity Molecular Similarity Assessment: MotivationAssessment: Motivation……

The Drug Discovery PipelineThe Drug Discovery Pipeline

Distribution of drug potenciesDistribution of drug potencies

Current Public Sector

Science

Prob

abili

ty

of s

ucce

ssC

umul

ativ

e C

ost

Ph IV-V(Additional indications, Safety monitoring)

Dedicated MedChem

begins

Compound accepted into Development

Target identification

Assay develop-ment Hit-to-

Probe

Screening (HTS or otherwise)

1 yr 1 yr 1 yr ~ 3 yrs 1 yr 2 yrs ~3 yrs

Ph III (Efficacy and safety in large populations)

Ph II (Dose finding, initial efficacy

in patient pop.)

Ph I (Safety)

Lead Optimization, Toxicology

Indefinite Indefinite1.5 yrs

Regulatory review

The Interface of NIH and Drug DevelopmentThe Interface of NIH and Drug Development

Proposed Public Sector

Science

Prob

abili

ty

of s

ucce

ssC

umul

ativ

e C

ost

Ph IV-V(Additional indications, Safety monitoring)

Dedicated MedChem

begins

Compound accepted into Development

Target identification

Assay develop-ment Hit-to-

Probe

Screening (HTS or otherwise)

1 yr 1 yr 1 yr ~ 3 yrs 1 yr 2 yrs ~3 yrs

Ph III (Efficacy and safety in large populations)

Ph II (Dose finding, initial efficacy

in patient pop.)

Ph I (Safety)

Lead Optimization, Toxicology

Indefinite Indefinite1.5 yrs

Regulatory review

The Interface of NIH and Drug DevelopmentThe Interface of NIH and Drug Development

Model Applicability Domain AnalysisModel Applicability Domain Analysis

Poor Model Applicability Good Model Applicability

musk non-musk

MacrocyclesMacrocycles –– musky odor or not ?musky odor or not ?(C. Davidson and B. (C. Davidson and B. LavineLavine))

musk non-musk

NitroaromaticsNitroaromatics –– musk or nonmusk or non--musk?musk?(C. Davidson and B. (C. Davidson and B. LavineLavine))

Descriptors Model PropertyMolecular Structures

• What features of a molecule are related to the property of interest ?

• What descriptors can capture that information?

Descriptor SelectionDescriptor Selection

GA/PCA Results with TAE descriptors GA/PCA Results with TAE descriptors

(C. Davidson and B. (C. Davidson and B. LavineLavine)) 7 selected features7 selected features

•1—Nonmusk

•2—Musk

Results with Wavelet and PEST DescriptorsResults with Wavelet and PEST Descriptors(C. Davidson and B. (C. Davidson and B. LavineLavine))

-3 -2 -1 0 1 2 3 4 5-3

-2

-1

0

1

2

3

2

2

1

2

2

2

2

2

22

2 2

1

21

2

2

1

2

2

1 1

22

22

2

2

1

2

22

11

22

2

11

2

2

2

2

2

2

111

2

2

2

1

2

2

11

2

1

22

1111

2

11

2

2

2

111

2

11

1

2

1

22

1

2

11

2

1

2

2

2

1

2

111

22 1

2

2

11211

1

2

2

2

1

22

22

2

2

2

1 1

2

1

2

2

21 111

2 1

2

2

1

2

2

2

2

2

1

22

2

12

2

1

22

2

22

2

2

2

2

2

2222

2

2

2

PC2

3D PC Plot Dim(9)

PC1

•1—Nonmusk

•2—Musk

-6 -4 -2 0 2 4 6-3

-2

-1

0

1

2

311111

11

1

22

111

2

111

222

1111

111

2

111 11

2

1

2

1

22

1111

2

1

1

222

11

1

1

222

111

2222

1

1

2

11111111

2

11 11

2

111

222

1

22222

111

22

1

2

1

2

11

22

1

2222

1

2222222222

11

22222222

1

2 22

1

2

11

222222222

1

222222

1

22

1

2 22

1

22222222

1

22222

1

22

1

222

22

2

2

1

2

22

2

222

2

22

2

2

2 2

2

2

2 2222

2

2

2222

2

2222

2

222

2

22

222

2

2 2

2

2

2222

2 2

2

2

22

22

2

2

2

222

22

2 2

22

22

2

222

2

2

2

2

22

111

11 111 1111111 1

111

1

11111

1

1

1111

1

11111

111

1 111

1111

111

11111

11P

C2

3D PC Plot Dim(30)

PC1

•1 Macro Non-Musk

•2 Macro Musk

•1 Nitro Non-Musk

•2 Nitro Musk

NitroaromaticsNitroaromatics and and macrocyclesmacrocycles (B. (B. LavineLavine))

Assessment of Molecular SimilarityAssessment of Molecular Similarity

Assessment of SimilarityAssessment of SimilarityIt was six men of IndostanTo learning much inclined, Who went to see the Elephant (Though all of them were blind), That each by observation Might satisfy his mindThe First approached the Elephant, And happening to fall Against his broad and sturdy side, At once began to bawl: “God bless me! but the Elephant Is very like a wall!”The Second, feeling of the tusk, Cried, “Ho! what have we here So very round and smooth and sharp? To me ’tis mighty clear This wonder of an Elephant Is very like a spear!”The Third approached the animal, And happening to take The squirming trunk within his hands, Thus boldly up and spake: “I see,” quoth he, “the Elephant Is very like a snake!”

The Fourth reached out an eager hand, And felt about the knee. “What most this wondrous beast is like Is mighty plain,” quoth he; “ ‘Tis clear enough the Elephant Is very like a tree!”The Fifth, who chanced to touch the ear, Said: “E’en the blindest man Can tell what this resembles most; Deny the fact who can This marvel of an Elephant Is very like a fan!”The Sixth no sooner had begun About the beast to grope, Than, seizing on the swinging tail That fell within his scope, “I see,” quoth he, “the Elephant Is very like a rope!”And so these men of Indostan Disputed loud and long, Each in his own opinion Exceeding stiff and strong, Though each was partly in the right, And all were in the wrong!

- John Godfrey Saxe (1816-1887)

What, precisely, is 'salt'? It is a certain white, solid, crystalline, material, also called sodium chloride. Does any of that solid white stuff exist in the sea? –Clearly not. One can make salt from sea water easily enough,but that fact does not establish that salt, as such, is present in brine. (Paper and ink can be made into a novel – but no novel actually exists in a stack of blank paper with a vial of ink close by.) When salt dissolves in water, what is present is no longer 'salt' but rather a collection of hydrated sodium cations and chloride anions, neither of which is precisely salt, nor is the collection. The aqueous material in brine is also significantly different from pure water. Salt may be considered to be present in seawater, but only in a more or less vague 'potential' way. Actually, there is no salt in the sea.

Why there is No Salt in the SeaWhy there is No Salt in the Sea—— JosephJoseph E.E. EarleyEarley

Foundations of ChemistryFoundations of Chemistry(Springer Netherlands)(Springer Netherlands)

Volume 7, Number 1, Pages 85Volume 7, Number 1, Pages 85--102, January 2005102, January 2005

• Our bodies are an aqueous environment — Liquid water constitutes one of the essential components of biological systems and it is difficult to overstate the role of water in biological structure and function.

• Proteins crystallize with several units of H2O weakly bound to the rest of the protein

• H2O provides the thermodynamic driving force for proteins to fold and self-assemble.

• It mediates not only tertiary and quaternary interactions, but also interactions between different biomolecules, and betweenbiomolecules and ligands or surfaces.

• H2O molecules are also known to take part in specific enzymatic reactions.

• Protein conformational dynamics appear to be linked (or slaved) to the dynamics of vicinal H2O, thereby affecting protein function.

• H2O in the vicinity of proteins and other biomolecules critically influence protein structure, dynamics, function and other thermodynamic and kinetic properties.

What about water in proteins?What about water in proteins?

1POC EP pH 3.0

1POC EP pH 6.0

1POC EP pH 4.0

1POC EP pH 7.0

1POC EP pH 5.0

1POC EP pH 8.0

pHpH--Sensitive Protein Surface Sensitive Protein Surface Electrostatic Potential MapsElectrostatic Potential Maps

DNA Binding Complex with 1CGP DNA Binding Complex with 1CGP

Representations of DNA Structure Representations of DNA Structure Can we improve on ATCG?Can we improve on ATCG?

• Most bioinformatic methods represent DNA by sequence of letters

• DNA bases assumed to act independently

• This representation of DNA has little to do with the energetics of binding of protein to DNA

Dixel approach• Characterization of DNA through features of electron densities on

the surfaces of the major and minor groves of the DNA

• The central base pair resides in the specific electronic environment generated by the flanking base pairs

DNA Nucleotide Triplets as DIXELSDNA Nucleotide Triplets as DIXELS

Ab Initio properties of base pair and two flanking base pairs (end capped) are computed.

Central base pair isencoded and stored as a “DIXEL” object.

A “basis set” of all possible nucleotide base pairs with all possible neighbors results in a set of base pair “triplets”.

Base pair properties perturbed by Base pair properties perturbed by flanking base pairsflanking base pairs

“First there are the known knowns”—These are the things that we know we know

“Then there are the known unknowns”—These are the things that we now know we do not know

“Finally there are also the unknown unknowns”—These are the things that we do not yet know we do not know

“And each day brings us a few more unknown unknowns”—Donald Rumsfeld, 2003

Challenges in Molecular Challenges in Molecular Similarity AssessmentSimilarity Assessment