clustering of small molecules based on similarity scores ... · international conference and...
TRANSCRIPT
Clustering of Small Molecules Based on Similarity Scores From Flexible 3D
Alignment
Adrian Kalaszi, Gabor Imre, Miklos J. Szabo, Timea Polgar, Krisztian Niesz
ChemAxon Ltd., 1031 Budapest, Zahony u. 7, Hungary Abstract There are several approaches for clustering chemical structures. Among these, the structure-based methods and techniques using classical 2D descriptors (e.g. chemical fingerprints or ECFP) are the most widely used.
Considering 3D information, such as conformers, 3D pharmacophore maps or molecular shapes can provide researchers more insight into the process and facilitate a deeper and more natural interpretability of corresponding
results. ChemAxon’s 3D alignment tool provides an automatic 3D shape-based flexible alignment option for handling small molecules and the resulting shape similarity scores calculated for the best fits can be further used in
similarity-based clustering as a part of scaffold hopping for finding new lead molecules.
Molecular Med TRI-CON 2013, February 11-15, 2013
Introduction It is generally accepted that molecular shape properties play a central role in ligand binding. Based on the
growing number of publications in the field, several descriptors and methods have been applied in shape-
based similarity screening [1]. These methods compete with other virtual screening techniques, such as ligand
and structure-based methods [2, 3]. Therefore, it is also expected that considering 3D shape alignment-based
similarity in clustering may also bring new and novel aspects besides the information provided by traditional 2D
clustering methods.
ChemAxon in 3D 3D structure generation / conformational analysis
Generate3D [4,5] is a molecular coordinate generation / conformational analysis component of ChemAxon’s
discovery tools (released in 2002), which is used by Marvin GUI’s Structure / Clean3D function, Conformers
Calculator Plugins as well as the molconvert command line tool.
3D flexible alignment
The 3D flexible alignment procedure (released in 2009; [6]) overlays two structures by maximizing the
intersection of their van der Waals volumes. The volume is partitioned by the underlying atomic properties,
such as extended atom types (force field types) or pharmacophoric types. Both molecules can be treated
flexible by tweaking their rotatable bonds, flexible rings and ring systems in a continuous manner during the
alignment. A single 3D conformer for each aligned structure is used as input for the alignment procedure. Thus,
this method provides valid 3D similarity scores for 2D / 0D input structures by automatically calling
Generate3D. After the alignment is completed the size of the volume intersection and the 3D Tanimoto (a
dimensionless measure of similarity between 0 and 1) can be obtained for further processing.
Example alignment workflow: 1) 2D input structures; 2) 3D conformer is generated and the shape is colored by atomic types; 3) the
volume intersection, which maximized during the alignment, is shown along with the resulting pose.
3D similarity – ligand based virtual screening
”Screen3D” is a ligand based 3D similarity calculation tool released in 2010. ”Screen3D” calculates the
intersection of the colored shape and the 3D Tamimoto. Apart from these shape-based measures ”Screen3D”
can also return a 3D similarity score calculated from intermolecular distance ranges [7]. The distance ranges
are calculated for each molecule by tweaking rotatable bonds to maximize or minimize the distance between
every pair of the selected atoms. The distance range similarity score is comparable in screening performance
to the shape based counterpart.
Benchmark results: Venkatraman et. al. [2] compared the performance of various 2D and 3D similarity methods on the Directory of
Useful Decoys [8]. The values represented by bluish columns are originated from their work; Screen3D performance results - shown
in orange - were measured in house based on this publication, using the same approach. (SCREEN3D_S8V: shape similarity with
volume intersection score, SCREEN3D_S8T: shape similarity with 3D Tanimoto score, SCREEN3D_H: distance range based
similarity).
Clustering - JKlustor ”JKlustor” Suite [9, 10] performs similarity and structure-based clustering of compound libraries and focused
sets in both hierarchical and non-hierarchical fashion. In addition ”JKlustor” Suite can carry out diversity
calculations and library comparisons based on molecular fingerprints and other descriptors. It is an essential
tool in combinatorial chemistry, virtual library design or other areas where a large number of compounds
need to be analyzed. The approach currently presented introduces 3D flexible alignment-based similarity
calculation to the JKlustor Suite. This allows the available similarity based algorithms to use structural data in
these clustering processes.
Aligned structure pair Aligned shapes
2D (0D) input Flexibly aligned results
ChemAxon Graphisoft Park, Hx Building H-1037 Budapest, Hungary
Phone: +36 1 453 2660 Fax: +36 1 453 2659 http://www.chemaxon.com
Structural
frameworks
MCS MCES
3D flexible
alignment
Chemical
hashed fp BCUT-like*
ECFP Pharmacophore
2D fp*
Calculated
property-
based*
User defined
FCFP*
2D
(0
D)
str
uc
ture
ba
se
d
alg
ori
thm
s
Mo
lecu
lar
de
sc
rip
tors
Euclidean
Tanimoto
Intersection
Sim
ila
rity
metr
ics
Sphere
exclusion
K-means
Ward’s
minimum
variance*
Sim
ila
rity
-ba
se
d c
lus
teri
ng
Euclidean
Tanimoto
Str
uc
ture
-ba
se
d
clu
ste
rin
g
Jarvis-
Patrick*
An overview of algorithms and descriptors available to use from the JKlustor suite.
*Note: some components are available as standalone tools
Proof of concept implementation The interface to the 3D flexible alignment functionality has been implemented in JKlustor through a
transparent pairwise similarity calculation. Furthermore, a visualization tool is also provided in order to
compare the results of the alignment-based similarity calculation with other descriptor implementations.
Sphere
exclusion
Alg
ori
thm
inte
rfa
ce
Generate 3D
Aromacity
H atoms
Structure ID
Descriptor
cache
Input
structures
(smiles)
Cache file
(DB))
Des
cri
pto
r in
terf
ace
Orchestration,
execution,
hierarchy
representation
Visualization, UI
(Web-enabled)
Output
3D
fle
x.
Ali
gn
.
inte
rfa
ce
Flexible 3D
alignment
engine
UI Client
Structures,
clusters,...
Architecture of the JKlustor extension. Interaction points with the user; standard JKlustor elements and the
interaction points with the flexible 3D alignment engine are depicted.
Clustering results of a small 3D fragment library
Clusters (centroids) resulting from a sphere exclusion clustering (r=0.4) of the heat shock protein 90 (hsp90) ligands contained
by the DUD database. The bar lengths are proportional to the cluster size.
References [1] Haigh, J. A.; Pickup, B. T.; Grant, J. A.; Nicholls, A.: Small Molecule shape-fingerprints. J. Chem. Inf. Model. 2005,
45, 673−684.
[2] Venkatraman, V.; Perez-Nueno, V. I.; Mavridis, L.; Ritchie, D. W.: Comprehensive comparison of ligand-based virtual
screening tools
against the DUD data set reveals limitations of current 3D methods. J. Chem. Inf. Model. 2010, 50, 2079−93.
[3] Hu, G.; Kuang, G.; Xiao, W.; Li, W.; Liu, G.; Tang, Y.: Performance Evaluation of 2D Fingerprint and 3D Shape
Similarity Methods in
Virtual Screening. J. Chem. Inf. Model. 2012, 52, 1103−1113
[4] http://www.chemaxon.com/marvin/help/calculations/conformation.html#conformer
[5] http://www.chemaxon.com/conf/Advanced_automatic_generation_of_3D_molecular_structures.pdf
[6] Marvin 5.1.2, 2012, ChemAxon (http://www.chemaxon.com)
[7] Deng, W.; Kalászi, A.: Screen3D: A Ligand-based 3D Similarity Search without Conformational Sampling.
International Conference and Exhibition on Computer Aided Drug Design & QSAR Oct 29th, 2012 Chicago, IL, USA
[8] Irwin, J. J.; Community benchmarks for virtual screening. J. Comput.- Aided Mol. Des. 2008, 22, 193-9.
[9] http://www.chemaxon.com/products/jklustor/
[10] http://www.chemaxon.com/conf/JKlustor.ppt
1
2
3
Using 3D similarity approaches to identify scaffold hopping cases
In most scaffold hopping cases, the compared molecules look very similar in terms of their 3D properties, but
they look quite different in terms of their 2-dimensional representation. Thus, it is proposed that such scaffold
hopping cases can be captured by the comparison of the calculated 2D and 3D similarities.
3D SHAPE dissimilarity
A) Scaffold hopping cases for Antihistamine drugs: 3D shape similarity values / 2D ECFP similarity values together
with the corresponding pair wise 3D alignment of molecules; B) The corresponding molecular pairs shown in the
2D ECFP vs. 3D SHAPE dissimilarity space.
2D
EC
FP d
issi
mila
rity
Do stop by booth 333 to pick up a discussion paper on our discovery
tools or a reprint of this poster.