![Page 1: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/1.jpg)
What’s new in JChem back-end and Markush storage, search and enumeration
Szabolcs Csepregi
Solutions for Cheminformatics
![Page 2: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/2.jpg)
Contents
• ChemAxon chemical database tools
• Main features of JChem Base, Cartridge
• Example interfaces: JSP, ASP, AJAX examples
• Integration with other CXN products
• Markush structure storage, search and enumeration
• Recent developments, plans
![Page 3: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/3.jpg)
Chemical database products
JChem Base– A library for adding chemical structures into relational
database systems. Available in Java, JSP and .NET– Open-source web application example is available.
JChem Cartridge for Oracle– Extends Oracle SQL with chemical operators and index.– SQL interface for ChemAxon functionality
Instant JChem– An all-in-one desktop chemical database application.
JChem Web Services – SOAP interface to JChem Base
JC4XL – Excel integration (coming)
3
![Page 4: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/4.jpg)
Compatibility and integration
Supported chemical file formats:• SMILES• MDL MOL/RXN/SDF/RDF (v2000 and v3000)• CML, MRV• IUPAC and traditional names• InChI, mol2, PDB, etc.
Database engines:• Oracle, MySQL, MS SQL Server, MS Access,
PostgreSQL, IBM DB2, Derby, etc.
All operating systems through:• Java API (JChem Base)• .NET API (JChem Base + IKVM) – for Windows• SQL (Cartridge)
4
![Page 5: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/5.jpg)
Structure searching: features• Substructure, Similarity,
Full, Full fragment, etc. search types
• Wide range of query atoms
• Query properties
• R-group queries
• Full SMARTS support
• Coordination compounds
• Link nodes
• Pseudo atoms, Lone pairs
• Relative stereo
• Reaction search features
• Polymers
• Position variation
• Hit coloring ...
www.chemaxon.com/conf/Structural_Search.ppt
5
![Page 6: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/6.jpg)
Structure searching: options
Some selected structure search options:– Chemical Terms filter constraint– Tautomer search– Stereo on/off– Ignore charge/isotope/radical/valence/polymers– Vague bond matching modes: „or aromatic”; ignore
bond types– Inverse hit list– Maximum search time / number of hits– SQL SELECT statement for pre-filtering– Ordering of results– etc.
6
![Page 7: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/7.jpg)
Structure search: performance
7
JChem Base 5.2.0,
Intel Quad Q6600 2.4GHz,
8GB RAM; Oracle 10.2.0.3
Number of compounds
Elapsed time
Duplicates not checked
Duplicates checked
10,000 21 s 26 s
100,000 2 min 2 min 36 s
200,000 3 min 45 s 5 min 5 s
Query Number of hits Search time
2 0.81 s
93 0.79 s
5,855 1.457 s
142,950 11.076 s
Compound registration:
Substructure search in PubChem (19.5 million
compounds):
![Page 8: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/8.jpg)
Table typesControl allowed chemical structures and available
operations
• Molecule
• Reaction
• Markush
• Query
• Any structure
8
![Page 9: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/9.jpg)
Example web applications
Open source JSP, ASP examples– Marvin applets
are used for query drawing and structurevisualization
AJAX example– Back-end is JChem
Web Services– No Java is needed
for browsing
Demo
9
![Page 10: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/10.jpg)
Integration
Integration with other ChemAxon tools: – Custom, uniform chemical representation. (Standardizer –
see separate presentation today.)– Automatically calculated properties by Chemical Terms
Calculated columns (Calculator plugins)– Additional similarity calculations (Screen - JChem Base
only) – Tautomer handling:
• Tautomer search
• Tautomer duplicate filter table/index option
• Custom tautomer transforms or canonical tautomer using Standardizer
– Query drawing and structure visualization (Marvin)Provides the most consistent interface and back-end.
10
![Page 11: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/11.jpg)
Integration
Additional Cartridge functionality– JChem index (for non-JChem tables)– Communication with Oracle optimizer– Reaction based enumeration (Reactor)– Format conversions – image generation also– Markush enumeration (Calculator plugins)– Property predictions through Chemical Terms
(Calculator plugins)
11
![Page 12: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/12.jpg)
Registration system
• New component for registration system is under development (API only)
• Main features:– Customizable business logic
• Multilevel duplication control • Customizable corporate registration ID • Handling of salts, batches, lots, samples, and mixtures
– Identification, split and registration of salt and solvent structures Storage of input structures in original format
– Mock registration (dry run)
– Pre-registration through a transitory area
– Basic, customizable implementation examples • Separate examples for chemists and registrars
• Web and Instant JChem interfaces will follow later
12
![Page 13: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/13.jpg)
Handling of Markush structures
![Page 14: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/14.jpg)
Markush structures
• Combinatorial Markush structure registration and search features handled in search and enumeration– R-groups (nesting to any depth)– Atom lists, bond lists– Position variation bond– Link nodes– Repeating units– Homology groups (aryl, alkyl, etc.)
• Built-in• User-defined
• Compatible Markush enumeration plugin
![Page 15: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/15.jpg)
Markush Enumeration
• Markush enumeration plugin– Full enumeration– Selected parts only– Random enumeration– Calculate library size:
exact size of huge Markush libraries
arbitrary precision orMagnitude
– Scaffold alignmentand coloring
– Markush code– Optional example
homology groupenumeration
![Page 16: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/16.jpg)
Markush storage & search
• Available in JChem Base and Instant JChem
• No enumeration involved – can handle very complex Markush structures (tested up to 1040, but no explicit limits were built in.)
• Substructure and Full structure search
• Basic query features supported
• Substructure hit visualization: „Markush structure reduction”
![Page 17: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/17.jpg)
Markush demo
![Page 18: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/18.jpg)
What’s new
![Page 19: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/19.jpg)
What’s new: JChem Base
5.1– Position variation in queries– New fast & reliable tautomer duplicate search
5.2– .NET API– Polymer storage and search– New query options and features including searching of
attached data, group matching of undefined R-atoms, repeating units.
– Improved substructure search performance– JChem Web Services– New metrics for similarity search (Tversky, etc.) (5.2.2)
![Page 20: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/20.jpg)
What’s new: JChem Base
Polymer support details
• Polymer brackets and properties(type, connectivity, etc.) considered during search and registration
• Attached data search (optional) – attached to atoms/bonds/brackets
• Source- and structure-based representation equivalence is checked (but can be switched off)– Addition to a double bond. E.g. polystyrene.– Polymerization through elimination of water or HCl. E.g.
polyester, polyamide.
![Page 21: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/21.jpg)
What’s new: JChem Base
Polymer support details (cont.)
• Ladder type polymers
• Phase-shifting (for ht SRU) (can be switched off)
• End group matching:– * atoms: unspecified end groups– Search option to switch on/off end group matching
• Copolymer types: co, alt, rnd, blk, grf, xl, mer, mod
• Polymer mixtures
• New search options
![Page 22: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/22.jpg)
What’s new: Cartridge-specific
5.1– Tautomer duplicate filtering index option– Alter index option– Improved import speed (5.1.3)– Improved upgrade: no need to remove/recreate indices
(5.1.4)
5.2– Interactive installer– Increased substructure search performance (5.2.2)– Tversky similarity search (5.2.2)
![Page 23: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/23.jpg)
What’s new: Markush
• New Features– Homology groups
• 19 built-in groups• Customizable:
– Examples (for built-in groups, enumeration only),
– Full user-defined homology groupsdefined by R-group definition
• Marvin templates for easier sketching
– Import reagent files as R-groups– Position variation and Repeating units
![Page 24: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/24.jpg)
Plans
![Page 25: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/25.jpg)
Plans: JChem Base & Cartridge
JChem Base
• Further speed improvements (SSS, similarity)
• New vague bond level options
• R-group decomposition integration
• Improved support for Screen molecular descriptors
Cartridge
• Screen molecular descriptors (BCUT, pharmacophore similarity, chemical hashed fp, etc) and metrics (Euclidean, Dice, etc.) for similarity search
• User-defined descriptor fingerprints
• Markush tables and search
• JChem Server, JChem cluster
![Page 26: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/26.jpg)
Plans: Markush
– .VMN import (format used by Merged Markush Service & Derwent World Patent Index)
– Multiple graphical attachment points of R-groups– Homology variation queries– Overlap analysis of Markush structures– Homology group properties (# of atoms, branching points,
# of heteroatoms, etc.)– Conditions for Markush variables
![Page 27: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/27.jpg)
Summary
• JChem Base and Cartridge are comprehensive and efficient
• Markush structure storage, search and enumeration now reaching patent features coverage
• Continuous development, improvements in the pipeline
![Page 28: What’s new in JChem back-end and Markush storage, search and enumeration](https://reader036.vdocument.in/reader036/viewer/2022062500/56815400550346895dc1fb5c/html5/thumbnails/28.jpg)
Find out more
• Product descriptions & linkswww.chemaxon.com/products.html
• Forumwww.chemaxon.com/forum
• Presentations and posterswww.chemaxon.com/conf
• Download
www.chemaxon.com/download.html