chemical descriptors and molecular graphs
DESCRIPTION
Problems and approaches in computational chemistry. Chemical descriptors and molecular graphs. Alessandra Roncaglioni - IRFMN. [email protected]. Outline. Descriptors definition Structure Descriptors Descriptors classification (bi- or tri- dimensional) Pros & Cons - PowerPoint PPT PresentationTRANSCRIPT
Chemical descriptors and molecular graphsAlessandra Roncaglioni - IRFMN
Problems and approaches in computational chemistry
OutlineOutline
Descriptors definition
Structure Descriptors Descriptors classification (bi- or tri-
dimensional)
Pros & Cons Overview of common descriptor classes
(mainly 2D)
Applications
Sw resources
Further reading2Problems and approaches in computational chemistry – 21 April 2008 – DEI –
Milano
IntroductionIntroduction
Molecular descriptors are numerical values that characterize properties of molecules
Examples: Physicochemical properties
(empirical) Values from algorithms, such as 2D
fingerprints
Vary in complexity of encoded information and in compute time
3Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Theoretical descriptorsTheoretical descriptors
“A molecular descriptor is the final
result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment”
4Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
www.moleculardescriptors.eu
Desiderable descriptors Desiderable descriptors characteristicscharacteristics Invariance with respect to labelling and
numbering of the molecule atoms Invariance with respect to the molecule roto-
translation An unambiguous computable definition Values in a suitable numerical range allowing structural interpretation no trivial correlation with other molecular descriptors gradual change in its values with gradual changes in
the molecular structure widely applicable preferably, allowing reversible decoding
(back from the descriptor value to the structure)
5Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
OutlineOutline
Descriptors definition
Structure Descriptors Descriptors classification (bi- or tri-
dimensional)
Pros & Cons Overview of common descriptor classes
(mainly 2D)
Applications
Sw resources
Further reading6Problems and approaches in computational chemistry – 21 April 2008 – DEI –
Milano
From chemical From chemical compounds to compounds to descriptorsdescriptors
7Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
CAS RN. 145131-25-5N-(2,6-Bis(1-methylethyl)phenyl)-N'-((1-(1-methyl-1H-indol-3-yl)cyclohexyl)methyl)urea
CC(C)C1=CC=CC(C(C)C)=C1NC(=O)NCC2(CCCCC2)C3=CN(C)C4=C3C=CC=C4
Descriptors classificationDescriptors classification
Depending on the structural dimensionality:
Up to 2D (0D-2D)Up to 2D (0D-2D)
Derived from the atomic composition and connectivity of molecules
3D3D
Encoding for energetic and spatial information
Molecular interaction fields (MIF)Molecular interaction fields (MIF)
Encoding for electrostatic and steric variation
8Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
2D Descriptors (I)2D Descriptors (I) Many groups accounting for different
characteristics May requires explicit H (check file format) Fast to be calculated (almost all expert
systems rely on 2D descriptors) More reproducible (do not require 3D
structure)but ... Might be focused on local contribution
neglecting intramolecular interactions Ignore conformational flexibility
9Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
N CH3
NHO
NH
CH3
CH3
CH3
CH3
2D Descriptors (II)2D Descriptors (II)
but ...
Ignore stereo configuration
Not invariants to tautomerism
10Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
N CH3
NHO
NH
CH3
CH3
CH3
CH3
3D Descriptors (I)3D Descriptors (I) Invaraint to roto-traslational changing
They require conformational search
Followed by QM/MM optimization
11Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Sampling
Minimize
3D Descriptors (II)3D Descriptors (II) More complete and realistic description of
relevant molecular characteristics Can discriminate among isomers and provide
hints to select the most stable tautomer
but ...
Computationally more demanding Involve stochastic steps: non deterministic
result Results depend upon the QM/MM theory used
for the optimization Reference structure: minimum conformation
in vacuum not necessairly being the bioactive one
12Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
MIF (I)MIF (I) Requires 3D conformation alligned in
the Euclidean space Relates variation in the field with
variation in the activity (3D-QSAR)
13Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Mol 1
St1 St2 … Stm El1 El2 … Elm
Mol 1 … … … … … … … … … …Mol 2 … … … … … … … … … …… … … … … … … … … … … …… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Mol n … … … … … … … … … …
Probes: N3+ sp3 Amine NH3 cation N2+ sp3 Amine NH2 cation N2: sp3 NH2 with lone pair N2= sp2 Amine NH2 cation N2 Neutral flat NH2 eg amide N1+ sp3 Amine NH cation N1: sp3 NH with lone pair N1= sp2 Amine NH cation N1 Neutral flat NH eg amide NH= sp2 NH with lone pair N1# sp NH with one hydrogen N: sp3 N with lone pair N:= sp2 N with lone pair N:# sp N with lone pair N-: Anionic tetrazole N NM3 Trimethyl-ammonium cation O sp2 carbonyl oxygen O:: sp2 Carboxy oxygen atom O- sp2 phenolate oxygen O= O of SO4 or sulfonamide OH Phenol or carboxy OH O1 Alkyl hydroxy OH group OC2 Ether oxygen OES sp3 ester oxygen atom ON Oxygen of nitro group OS O of sulfone / sulfoxide OH2 Water OFU Furan oxygen atom C3 Methyl CH3 group C1= sp2 CH aromatic or vinyl .... ............ .... ............ BOTH The amphipathic Probe DRY The hydrophobic Probe
Steric interaction (van der Waals energy calculated by Lennard-Jones function)
Electrostatic interaction (calculated by coulombian type function)
... ... ...
Hydrogen bonding energy
Solvation energy
14Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
MIF (II)MIF (II)
Green = steric +; Yellow = steric -; Red = charge -; Blue = charge +
Countur map
MIF (III)MIF (III) More biologically plausible (receptor
interactions) Identifies areas responsible for the
variation of the activitybut … Very sensitive to conformation
selection and to the chosen alignment Proper selection of force fields
Large number of grid point cotribution
QSAR modelling complexity 15Problems and approaches in computational chemistry – 21 April 2008 – DEI –
Milano
OutlineOutline
Descriptors definition
Structure Descriptors Descriptors classification (bi- or tri-
dimensional)
Pros & Cons Overview of common descriptor classes
(mainly 2D)
Applications
Sw resources
Further reading16Problems and approaches in computational chemistry – 21 April 2008 – DEI –
Milano
Types of descriptorsTypes of descriptors Constitutional descriptors Topological descriptors
(topological indexes, connectivity indexes, information contents)
Atom centred fragments Functional groups Fingerprints Electrostatic descriptors(*)
(charge descriptors) Geometric descriptors* Physico-chemical
properties
17Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Quantum- chemical descriptors*
Thermodynamic descriptors(*)
Pharmacophores WHIM & GETAWAY* BCUT (or Burden
eigenvalues) Autocorrelation
descriptors EVA descriptors*
* 3D descriptors
Constitutional Constitutional descriptorsdescriptors The most simple and commonly used
descriptors Reflecting the molecular composition
of a compound without any information about its molecular geometry
Examples Molecular weight Count of atoms and bonds Count of rings
18Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Molecular graphMolecular graph A molecular graph or chemical graph is a
representation of the structural formula of a chemical compound in terms of graph theory.
It’s a very convenient and natural way of representing the relationships between objects: objects are represented by vertexes and the relationship between them by edges.
19Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
.
. . .
. . . .. Vertex
Edge
Topological descriptorsTopological descriptors Calculated from the 2D graph of the
molecule on the basis of connection tables or closely-related formats e.g. the distance matrix
an N x N table showing the distance (in bonds) between each pair of atoms
Obtained by operations on the distance matrices and whose values are independent of vertex numbering or labelling (graph invariants)
Characterize structures according to size, degree of branching, and overall shape, symmetry and cycling
20Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Connection tableConnection table
9
OH
CH2
CHNH2
OHO 13
4
5
6
8
11
12
13
21Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
1 O 1 2 12 C 0 1 1 3 2 4
13 O 0 2 24 C 1 2 1 5 1 6
15 N 2 4 16 C 2 4 1 7 17 C 0 6 1 8 2
12 18 C 1 7 2 9 19 C 1 8 1 10 210 C 0 9 2 11 1
13 111 C 1 10 1 12 212 C 1 11 2 7 113 O 1 10 1
Distance matrixDistance matrix
9
OH
CH2
CHNH2
OHO 13
4
5
6
8
11
12
13
22Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
1 2 3 4 5 6 7 8 9 10 11 12 13
1 O 1 2 2 3 3 4 5 6 7 6 5 8
2 1 C 1 1 2 2 3 4 5 6 5 4 7
3 2 1 O 2 3 3 4 5 6 7 6 5 8
4 2 1 2 C 1 1 2 3 4 5 4 3 6
5 3 2 3 1 N 2 3 4 5 6 5 4 7
6 3 2 3 1 2 C 1 2 3 4 3 2 5
7 4 3 4 2 3 1 C 1 2 3 2 1 4
8 5 4 5 3 4 2 1 C 1 2 3 2 3
9 6 5 6 4 5 3 2 1 C 1 2 3 2
10 7 6 7 5 6 4 3 2 1 C 1 2 1
11 6 5 6 4 5 3 2 3 2 1 C 1 2
12 5 4 5 3 4 2 1 2 3 2 1 C 3
13 8 7 8 6 7 5 4 3 2 1 2 3 O
Wiener indexWiener index
Counts the number of bonds between pairs of atoms and sums the distances between all pairs
Add up all the off-diagonal elements and divide by 2 (because matrix is symmetrical)
23Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
W = 268W = 268
Molecular connectivity Molecular connectivity indexesindexes A whole series of indexes, developed
by Kier & Hall in the late ‘70s, following earlier work by Randić
Identify all possible subgraphs of different sizes in the molecule
Size of subgraph determines the order of the index 0 bond subgraph gives a zero order index 1-bond subgraph gives a 1st order index 2-bond subgraph gives a 2nd order index 3-bond subgraph gives a 3rd order index ...
24Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
RandiRandić indexć index Calculated from a the H-depleted
molecular graph where each vertex is weighted by the vertex degree, i.e. the number of connected non-hydrogen atoms
Example:
25Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
1
1
33
2
1
1
valence at vertexes
3
3 9
3
6 2
edge terms as reciprocal of
squared root of bond values
bond values as products of
vertex valence
.577
.577 .333
.577.408
.707
Randić index = sum of edge terms = 3.179
Kier & Hall indexesKier & Hall indexes Chi indexes introduces valence values to
encode sigma, pi, and lone pair electrons
δi and δj (i ≠ j) = values of the atomic connectivity
Atomic connectivity δi is calculated by:
Zi = tot nr electrons in the i-th atom
Zi υ = nr of valence electrons
Hi = nr H attached to the i-th atom
26Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Kier Shape IndexesKier Shape Indexes Characterize aspects of molecular shape
Compare the molecule with the “extreme shapes” possible for that number of atoms
Based on the number of atoms (N) and the number of bonds (P) in the graph: 1 = N (N-1)2 / P2
2 = (N-1) (N-2)2 / P2
3 = (N-1) (N-3)2 / P2 (if N is odd) 3 = (N-3) (N-2)2 / P2 (if N is even)
alpha-modified kappa indexes can be generated taking into account the sizes of atoms, relative to C sp3 atom
A molecular flexibility index is derived from these
= 1 2
/ N
27Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Information content Information content indexesindexes Defined on the basis of the Shannon
information theory
ni = nr of atoms in the i-th class
n = tot nr of atoms in the molecule Classes are determined by the coordination
sphere taken into account, leading to indexes of different order k.
Other information content indices:
28Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
SIC - structural ICCIC - complementary ICBIC - bonding ICq = nr of edges
Considerations about Considerations about topological descriptorstopological descriptors Frequently used, easily
calculated It is often difficult to disclose the
chemical meaning of highest order indexes
Topological indexes effectively encode the same information as fingerprint fragments in a less obvious way but can be processed numerically
29Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Atom centred fragments Atom centred fragments & functional groups& functional groups Number of specific atom types in
a molecule calculated by knowing the molecular composition and atom connectivities
Number of specific functional groups in a molecule, calculated by knowing the molecular composition and atom connectivities
30Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
O
OR
OH
O
R N
O
R
R
2D Fingerprints2D Fingerprints Two types:
One based on a fragment dictionary Each bit position corresponds to a specific
substructure fragment Fragments that occur infrequently may be
more useful Another based on hashed methods
Not dependent on a pre-defined dictionary Any fragment can be encoded
Originally designed for substructure searching, not for molecular descriptors
31Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
000101000101000100000000011010100110101000000101000000001000
000101000101000100000000011010100110101000000001000000001000
Fragment dictionariesFragment dictionaries
32Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
PharmacophoresPharmacophores Used in drug design Based on atoms or substructures thought
to be relevant for receptor binding: specification of the spatial arrangement of a small number of atoms or functional groups
Typically include H bond donors and acceptors, charged centers, aromatic ring centers and hydrophobic centers
With the model in hand, search databases for molecules that fit this spatial environment
Might be 3D33Problems and approaches in computational chemistry – 21 April 2008 – DEI –
Milano
Creating a Creating a PharmacophorePharmacophore
O
O
OH
O
O
OH
34Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Physico-chemical Physico-chemical PropertiesProperties Will hear about them during QSPR lesson The key descriptor widespread in QSAR is
hydrophobicity LogP – the logarithm of the partition
coefficient between n-octanol and water LogD – correct LogP on the basis of the
dissociated fraction of the compound Experimentally assessed with shaker
flask or reversed phase HPLC It is often useful to be able to calculate a
physico-chemical property for a compound from its structure
35Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
LogP calculationLogP calculation Many methods have been proposed for
calculating a good estimate for LogP Fragment-based methods (ClogP) pioneered by Corwin Hansch and Al Leo
(Pomona College) identify large fragments, whose contribution
to logP value is known from their occurrence in other compounds with measured logP
large “training set” of compounds with accurately-measured logP (the “Starlist”)
works very well if test compound has the right fragments
problems arise if test compound contains fragments that are “missing” from the training set
36Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
LogP calculationLogP calculation Atom-based methods (AlogP, XlogP,
SlogP) pioneered by Gordon Crippen (Univ. Michigan) based on identifying a series of “atom types” in
the molecule essentially, small atom-centred fragments usually 60-200 such fragments are involved
each atom-type is assigned a numerical value logP is obtained by adding values for the atom
types present in the test molecule atom-type values are obtained by regression
analysis, based on a set of compounds with measured logP
sometimes some extra correction factors are used too
37Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Summary
38Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Rognan D., British Journal of Pharmacology (2007) 152, 38–52
OutlineOutline
Descriptors definition
Structure Descriptors Descriptors classification (bi- or tri-
dimensional)
Pros & Cons Overview of common descriptor classes
(mainly 2D)
Applications
Sw resources
Further reading39Problems and approaches in computational chemistry – 21 April 2008 – DEI –
Milano
Quantitative Structure-Quantitative Structure-Activity RelationshipsActivity Relationships Tomorrow … Lessons 4&5
40Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
ChemoinformaticsChemoinformatics Molecular database management Reverse engineering Chemical similarity assessment
41Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Molecular similarityMolecular similarity The descriptors of a molecule can be considered a vector of
attributes (properties). The attributes may be real number (continuous variables) or
they may be binary in nature (binary variables).
42Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
(Range 0 to N)
cba
cSAB
N
i
N
i
N
i iBiAiBiA
N
i iBiAAB
XXXX
XXS
1 1 1
22
1
)()(
For continuous variables For binary variables
Tanimoto similarity coefficient
(Range -.333 to +1)
(Range 0 to 1)
N
i
N
i iBiA
N
i iBiAAB
XX
XXS
1 1
22
1
)()(
2
ba
cSAB
2Hodgkin
index(Range –1 to +1) (Range 0 to 1)
N
iiBiAAB XXD
1
2 cbaDAB 2Euclidean distance
(Range 0 to )
X are vectors
a numnber of bits on for A b numnber of bits on for Bc numnber of bits on for A AND B
Drug designDrug design Hightroughput virtual screening
43Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
OutlineOutline
Descriptors definition
Structure Descriptors Descriptors classification (bi- or tri-
dimensional)
Pros & Cons Overview of common descriptor classes
(mainly 2D)
Applications
Sw resources
Further reading44Problems and approaches in computational chemistry – 21 April 2008 – DEI –
Milano
Software resourcesSoftware resources Db of calculated descriptors
MOLE db http://michem.disat.unimib.it/mole_db/ Commercial sw
CODESSA, Dragon, MDL, TSAR, .... Free sw
Virtual Computational Chemistry Laboratory www.vvclab.org
MODEL - Molecular Descriptor Lab http://jing.cz3.nus.edu.sg/cgi-bin/model/model.cgi
Open source sw/libraries Chemistry Development Kit (CDK)
http://almost.cubic.uni-koeln.de/cdk/cdk_top Linux4Chemistry
http://www.redbrick.dcu.ie/~noel/linux4chemistry/45Problems and approaches in computational chemistry – 21 April 2008 – DEI –
Milano
Further readingFurther reading
Web www.moleculardescriptors.eu
Book “Handbook of Molecular Descriptors”. Roberto Todeschini and
Viviana Consonni, Wiley-VCH, 2000.
Papers Estrada,E., Molina,E. and Perdomo-López,I. (2001). Can 3D
Structural Parameters Be Predicted from 2D (Topological) Molecular Descriptors? J.Chem.Inf.Comput.Sci., 41, 1015-1021.
Katritzky,A.R. and Gordeeva,E.V. (1993). Traditional Topological Indices vs Electronic, Geometrical, and Combined Molecular Descriptors in QSAR/QSPR Research. J.Chem.Inf.Comput.Sci., 33, 835-857.
Randic,M. (1990). The Nature of the Chemical Structure. J.Math.Chem., 4, 157-184.
Tetko,I.V. (2003). The WWW as a Tool to Obtain Molecular Parameters. Mini Reviews in Medicinal Chemistry, 3, 809-820.
46Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Concluding remarksConcluding remarks
Depending on the application define the preferred complexity level for chemical description
Avoid to use meaningless numbers: all descriptor types have advantages and limitations but easily interpretable descriptors might be preferred
Examples
47Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Tautomers (I)Tautomers (I)
48Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Tautomers (II)Tautomers (II)
49Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Lipophilicity descriptor variationPredicted values for logBCF model
3D descriptors variability 3D descriptors variability (I)(I)
Intra Lab.
Inter Lab. (AM1)
Inter Lab. (PM3)
AM1
PM3
B3L HF
HF 82.6 79.9 94.0 -
B3L 85.2 81.6 -
PM3
97.8 -
AM1
-
Lab1 Lab2 Lab3
Lab3 98.6 98.6 -
Lab2 99.5 -
Lab1 -
Lab1 Lab2
Lab2 99.8 -
Lab1 -
LUMO energy
50Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
3D descriptors variability 3D descriptors variability (II)(II)
Intra Lab.
Inter Lab. (PM3)
AM1
PM3
B3L HF
HF 85.5 80.2 97.6 -
B3L 82.8 77.2 -
PM3
91.2 -
AM1
-
Lab1 Lab2 Lab3
Lab3 58.4 72.2 -
Lab2 67.4 -
Lab1 -
Dipole moment
Lab 3
Lab 1
Lab 2
51Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano