tautomerism in chemical information management systems · 2017-06-27 · tautomerism in chemical...
TRANSCRIPT
Wendy Warr & Associates
Tautomerism in chemical
information management
systems
Dr. Wendy A. Warr
http://www.warr.com
Tautomerism in chemical information
management systems
Author: Wendy A. Warr
DOI: 10.1007/s10822-010-9338-4
Wendy Warr & Associates
Perspectives Issue Devoted to
Tautomerism in Molecular Design
Edited by Yvonne Martin
“Chemical Information”
Aspects
• Registration procedures
• Storage of tautomers
• Exact and substructure search
• Depiction of results
Wendy Warr & Associates
Software and Database Vendors
• Accelrys
• ACD/Labs
• Beilstein/Reaxys
• CambridgeSoft
• CAS
• CCDC
• CCG
• ChemAxon
• ChemoSoft
• ChemSpider
• CWM Global Search
• Daylight
• Dialog
• IDBS
• InfoChem
• InhibOx
• John Wiley & Sons
• Molecular Networks
• NCI/CADD
• OpenEye
• Thieme
• PubChem
• Questel
• Schrödinger
• SciTouch
• Symyx
• Thomson Reuters
• Xemistry (CACTVS)
Wendy Warr & Associates
Not Included
• ABCD (J&J)
• BioRad (KnowItAll)
• CDK
• eMolecules
• SimBioSys
• Tripos
• ZINC
Wendy Warr & Associates
Chemical Structure
Representation
Wendy Warr & Associates
Morgan Algorithm
Wendy Warr & Associates
Morgan, H. L. The generation of
a unique machine description for
chemical structures - a
technique developed at
Chemical Abstracts Service. J.
Chem. Doc. 1965, 5(2),107-113.
CTfile
Wendy Warr & Associates
SMILES
Wendy Warr & Associates
CC1=CC(Br)CCC1
SMILES
• OpenEye canonical SMILES
• Daylight canonical SMILES
• SciTouch canonical SMILES
• ChemAxon canonical SMILES
• …
Wendy Warr & Associates
IUPAC International Chemical
Identifier (InChI)
Wendy Warr & Associates
InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
InChIKey=RYYVLZVUVIJVGH-UHFFFAOYSA-N
NCI/CADD Identifiers
(CACTVS Hashcodes)
Wendy Warr & Associates
9850FD9F9E2B4E25-FICTS-01-579850FD9F9E2B4E25-FICuS-01-789850FD9F9E2B4E25-uuuuu-01-27
Definition of Tautomerism
Q = C, N, S, P, Sb, As, Se, Te, Br, Cl or I
M, Z = trivalent N, bivalent O, S, Se or Te
[Either M or Z = C]
H = H, D, T [or + or -]
Extended system, ring/chain, etc.
Wendy Warr & Associates
M=Q-ZH HM-Q=Z
Straightforward
Wendy Warr & Associates
1,7 shift
1,3 shift 1,5 shift
More Complex
Wendy Warr & Associates
1 2 43
Degree of Unsaturation
Wendy Warr & Associates
Ring Opening
Wendy Warr & Associates
Fluxional structures
Wendy Warr & Associates
Mesomers
Wendy Warr & Associates
NEMA Key=6P1SUP7NENNHV4V61WRZP5S2ES8NZF NEMA Key=CKGEHDBX4KZPW3VV6DXTVM5BB689GB
InChI=1S/C16H18N3S.ClH/c1-18(2)11-5-7-13-15(9-
11)20-16-10-12(19(3)4)6-8-14(16)17-13;/h5-10H,1-
4H3;1H/q+1;/p-1
InChI=1S/C16H18N3S.ClH/c1-18(2)11-5-7-13-15(9-
11)20-16-10-12(19(3)4)6-8-14(16)17-13;/h5-10H,1-
4H3;1H/q+1;/p-1
CXKWCBBOMKCUKX-UHFFFAOYSA-M
Same InChIKey
Different NEMA Keys
Tautomers
Wendy Warr & Associates
NEMA Key=CU3YSHT7DX8KUTKGRNS5GH3B4UQBFA NEMA Key=CTDBHWQW8CQJHC3S5AH6X4QJWAVMKD
InChI=1S/C10H13N5O3/c11-8-7-9(14-10(17)13
-8)15(4-12-7)6-2-1-5(3-16)18-6/h4-6,16H,1-3H2,
(H3,11,13,14,17)/t5-,6+/m0/s1
InChI=1S/C10H13N5O3/c11-8-7-9(14-10(17)13
-8)15(4-12-7)6-2-1-5(3-16)18-6/h4-6,16H,1-3H2,
(H3,11,13,14,17)/t5-,6+/m0/s1
KITPKMKMNZXFDK-NTSWFWBYSA-N
Different NEMA Keys
Same InChIKey
Unreasonable
Wendy Warr & Associates
Multiple
Overlapping
Wendy Warr & Associates
5
6
7
8
Overlapping
Wendy Warr & Associates
9 10 11
Registration
Wendy Warr & Associates
Registration Objectives
• Corporate database
• Stock room database
• Predicting spectra
• Reaction mechanisms
• Ultra-low temperature lab
Wendy Warr & Associates
Registration Options
• Enumerate all tautomers; store all
tautomers
• Calculate canonical tautomer; store
canonical tautomer
• Enumerate all tautomers
– Rank [as major, minor, or conditions
dependent (ACD/Labs)]
– Allow user to choose which form to store
Wendy Warr & Associates
Schrödinger
• Epik
– Enumerate all energetically reasonable
tautomers
– Enumerate all energetically reasonable
ionization states
• Store all tautomers and ionization states
• Canvas
– identifies duplicates by canonical SMILES
–Wendy Warr & Associates
Are A and B Tautomers?
• If A and B are identical, accept
• If the total number of hydrogen atoms or charges is
not identical, reject
• Examine the heavy-atom skeletons; reject if not
identical
• Enumerate all tautomers for A; if any is the same as
B, accept
• Enumerate all tautomers for B; if any is the same as
A, accept
• Otherwise reject.
Wendy Warr & Associates
Enumeration of tautomers
• Sayle, R. A.; Delany, J. J. Canonicalization and enumeration of
tautomers. Paper presented at EuroMUG99, Cambridge, UK,
28-29 Oct 1999
• Oellien, F.; Cramer, J.; Beyer, C.; Ihlenfeldt, W-D.; Selzer, P. M.
(2006) The impact of tautomer forms on pharmacophore-based
virtual screening. J. Chem. Inf. Model. 2006, 46, 2342-2354.
• Greenwood, J. R.; Calkins, D.; Sullivan, A. P.; Shelley, J. C.
Towards the comprehensive, rapid, and accurate prediction of
the favorable tautomeric states of drug-like molecules in
aqueous solution. J. Comput.-Aided Mol. Des. 2010, published
online March 31, 2010
Wendy Warr & Associates
Storage of Tautomers
Wendy Warr & Associates
Concept A
• Generate all tautomers
• Impossible to calculate lowest energy
tautomer
• Use rules for consistent generation [of a
low energy form]
• Store this form [as canonical SMILES]
Wendy Warr & Associates
Concept B
• Generate all tautomers
• Impossible to calculate lowest energy
tautomer
• Store all tautomers
• [Store all protomers]
Wendy Warr & Associates
Structure Search
Wendy Warr & Associates
Structure Search
• Exact matches done by “flexmatch”,
SMILES, hashcodes etc.
• Substructure search
– Hard to perceive all tautomers for a
substructure
Wendy Warr & Associates
Approaches to Substructure Search
• Address problem at registration stage
– store all tautomers
• Address problem at search stage
– enumerate database structures on the fly
– or enumerate query structure
– or user takes care specifying query
• Combine methods
• Ignore the problem
Wendy Warr & Associates
Depiction of Results
Wendy Warr & Associates
Depicting Results
• Display input, registered structure
• Display matched tautomer
– good approach if substructure is highlighted
• Display standard form
• Let user choose
• Experimental results match displayed
tautomer
Wendy Warr & Associates
ChemAxon Approaches
• Normalize the structure (“generic
tautomers”)
• Allow for tautomers at search time
• Choose a preferred tautomer
• Customize preferences in Standardizer
Wendy Warr & Associates
ChemAxon Tautomerization
Plugin• Generates all, dominant and canonical
tautomers
• Calculates canonical tautomer by
empirical rules
• Tries to make canonical tautomer the
dominant tautomer (includes pKa filter)
• Handles dearomatization and
stereochemistry
Wendy Warr & Associates
Customization
• Choose dominant tautomer
– set operating pH
– set maximum distance (# bonds) of a
single proton migration
– protect structural features
• aromaticity, charge, stereochemistry, stable
functional groups
– exclude unstable antiaromatic compounds
Wendy Warr & Associates
ChemAxon software
• Stores canonical form, or all tautomers
• Enumerates query tautomers (as far as
possible)
• Usually displays structure originally
input
• Optionally displays standard tautomer
Wendy Warr & Associates
Observations
• Computational chemistry companies
– does the ligand match the receptor?
– ligand preparation
– pKa algorithms, rules, energetics
– “rigorous” approaches
• Informatics companies
– does the compound match the patent?
– building registries and inventories
– graph theory
– examples (structures)
– pragmatic approaches
• Hybrid companies
Wendy Warr & Associates
Acknowledgments
• All 28 “vendors”
– including ChemAxon
• Jonathan Brecher
• Geoff Skillman
• Russ Hillard
• Keith Taylor
Wendy Warr & Associates