![Page 1: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/1.jpg)
CSRML – A New Markup LanguageCSRML A New Markup Language Definition for Chemical Substructure R t tiRepresentationCh i t f H S h b
Molecular Networks GmbH
Christof H. Schwab
Henkestraße 9191052 Erlangen, Germany
l l t kwww.molecular-networks.com
![Page 2: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/2.jpg)
OutlineOutline
Chemical subgraphsRepresentation and use cases
De facto standardsRequirements of new definition of subgraph representationXML-based substructure representation
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 2
![Page 3: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/3.jpg)
Chemical Subgraphs and Substructures
Well established concept in chemistry and chemoinformatics
Ray and Kirsch, Finding Chemical Records by Digital Computers. Science, 1957, 126, 814-819Fisanick et al. Substructure Searching of Computer-ReadableFisanick et al. Substructure Searching of Computer Readable Chemical Abstracts Service Ninth Collective Index Chemical Nomenclature Files. J. Chem. Inf. Comput. Sci. 1975, 15 (2), 73-84
Employed by almost all software packages that deal with sets of chemical structures and reactionssets of chemical structures and reactions
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 3
![Page 4: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/4.jpg)
Chemical Substructures Use CasesChemical Substructures – Use Cases
Chemical database queriesFind structure(s) enclosing the query substructureRetrieval of analogs or similar structures
MCSS searchesFingerprintingAnalysis of chemical structures
S l lStructural alertsTTC analysis
Highlighting of functional groupsHighlighting of functional groups
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 4
![Page 5: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/5.jpg)
Example: Database LookupExample: Database Lookup
ChemIDplusQuery
ChlorobenzeneSearch mode
Substructure
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 5
![Page 6: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/6.jpg)
Example: Database LookupExample: Database Lookup
25,512 hits
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 6
![Page 7: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/7.jpg)
Example: Query with PropertiesExample: Query with Properties
Find chlorobenzene derivatives which are easily hydrolyzed at standard conditions
ClR
OH OHR
+
Substructure based query will return both
R R
Cl ClNONO2
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 7
NO2
![Page 8: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/8.jpg)
Example: Query with PropertiesExample: Query with Properties
Nucleophilic aromatic substitution
Cl
OH
Cl
RCl
ROH
R+ OH
- Cl
Chlorobenzene doesnot react at standard
ClNO2
Cl
conditionsNO2
Reaction conditions 400 °C, 300 bar room temp.Resonance stabilization 0 kJ/mol 43.5 kJ/molResonance stabilization 0 kJ/mol 43.5 kJ/mol
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 8
![Page 9: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/9.jpg)
Example: Query with PropertiesExample: Query with Properties
It is not sufficient to have queries solely based on substructures
ClNO2
Cl
NO2
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 9
![Page 10: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/10.jpg)
Existing de facto StandardsExisting de facto Standards
SMARTSSubstructure specification by text line notationDefinition complex substructure patterns including logical operations, recursion, etc
MDL CTab QueryMDL CTab QueryCTab file based query definitionPotentially extendible using SD properties in non-standard wayy g p p y
SYBYL line notation (SLN)Substructure specification by text line notationSupport of property annotations, macros, R-groups, etc
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 10
![Page 11: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/11.jpg)
Limitations of Existing StandardsLimitations of Existing Standards
No provision of built-in extension mechanismsNo support of standardized property annotation (except SLN)No support of "inline" test casesLimited set of properties for annotation
No built-in support for comments, documentation of queries etcqueries, etc
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 11
![Page 12: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/12.jpg)
Limitations of Existing StandardsLimitations of Existing Standards
No mechanisms to validate queries prior to executionErrors – both syntax and semantic ones – first seen when executing ththe query
Diffic lt and error prone to inp tDifficult and error-prone to input
P i t f tProprietary formatsVery few free/open source libraries and GUI tools
⇒ Need for a new definition or standard?
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 12
![Page 13: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/13.jpg)
Requirements for New Substructure Representation Definition
Well defined representation of (sub)structuresUnambiguous interpretationClear document structure
S f ( )Support of (any) property annotation, query logic, etcE.g., physicochemical properties, toxicity alerts, etc
Support of comments, documentation, etc
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 13
![Page 14: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/14.jpg)
Requirements for New Substructure Representation Definition
Built-in validationMechanisms to validate the syntax of queriesTest cases to validate the semantics of queries
C fConversion of queries into existing standardsBuilt-in support for future extensionsNon-proprietary, open format
⇒ XML (?)
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 14
![Page 15: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/15.jpg)
Advantages of XML-based (Sub)Structure Representation
Structured representation of structures, test cases, etcNative support of
Syntax validationComments and documentation (extensible)
Easy toTransfer/exchangeIntegrate into other XML based languagesIntegrate into other XML-based languagesExtend and modify
XML open standard as wellXML open standard as wellLarge number of software available to work with
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 15
![Page 16: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/16.jpg)
XML-based (Sub)Structure Representation
Chemical Subgraph Representation Markup Language
CSRML
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 16
![Page 17: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/17.jpg)
CSRML Object ModelCSRML Object Model
CSRML document
SubgraphSubgraphSubgraph (Sub)StructureAtoms
AnnotationsA t tiAtomsAtoms
Bonds
AnnotationsAnnotationsAnnotations
AnnotationsAnnotationsAnnotations
AnnotationsAnnotationsAnnotationsBondsBonds
MustMatch
E-Systems
Annotations
AnnotationsAnnotationsAnnotationsE-Systemse--Systems
MustNotMatch
StructureStructureStructure
StructureStructure
mandatory
optional
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 17
StructureStructureStructure
![Page 18: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/18.jpg)
Model of Single SubgraphRepresentation
Target (sub)structure, molecule, or a disconnected graph which represents the query
Connectivity (atoms, bonds, e--systems)Annotated query features and other propertiesLogical constructsLogical constructs
Test structure(s) that MUST match the targetTest structure(s) that MUST match the targetTest structure(s) that MUST NOT match the target
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 18
![Page 19: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/19.jpg)
XML Grammar Definition for CSRMLXML Grammar Definition for CSRML
Enables easy validation of query definitionsXML documents have to be well-formed XML documents can be validated against data model (DTD or XSD)
Additional checks to validate the query prior to processingReferential integrity checksUnique or distinct constraintsUnique or distinct constraints
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 19
![Page 20: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/20.jpg)
Query with Properties ExampleQuery with Properties – Example
<mol id="M1"><atomArray>
<atom id "A1" element "N/A" x "0" y "0"><atom id="A1" element="N/A" x="0" y="0"><query feature="atomList">
<value>N</value><value>O</value>
</query><query feature "piCharge" logic "AND"><query feature="piCharge" logic="AND">
<range><min>–0.6</min><max>–0.1</max>
</range>
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010
…
20
![Page 21: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/21.jpg)
Query with PropertiesQuery with Properties
Easy accessible annotated query features
Easy nesting and logical combination of query features
Automatic validation of query syntax by XML parserBased on XML schema (grammar)Partial validation of query semantics
No chemical validation at this step!
⇒ The more checking is done by XML parser, the lesschecking has to be done by implementing library!checking has to be done by implementing library!
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 21
![Page 22: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/22.jpg)
Annotation ModelAnnotation Model
Example of definition for a CSRML annotation
<annotation domain="bond" featureKey="ringBond"dataType="xsd:boolean" implementation="M_RING_BOND_IMPL" priority="2" severity="skip">priority 2 severity skip >
<label>Ring bond
</label></label><description>
Bond in ring system; bond order disregarded.</d i ti ></description>…</annotation>
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 22
![Page 23: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/23.jpg)
Storage of CSRML QueriesStorage of CSRML Queries
Storage as XML documentsSingle or multiple queries in a single document
Storage in (XML) databasesSubstructure searches in chemistry-aware XML databases
Integration into other XML-based formats, e.g., ToxML
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 23
![Page 24: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/24.jpg)
Exchange of CSRML QueriesExchange of CSRML Queries
Transfer between different applications as XML documentsRegular filesInternet (SOAP, HTTP, …)
C fConversion into existing formatsOmission or separate export of not-supported featuresTransformation into query depicts (SVG)Transformation into query depicts (SVG)
Conversion from existing formatsConversion from existing formatsE.g., from SMARTS to CSRML
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 24
![Page 25: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/25.jpg)
Current StatusCurrent Status
First draft of CSRML definitionData model & schema designDesign of annotation modelDefault set of query features
Beta version of reference implementationLGPL or similarly licensed library to support I/O and class structuresLGPL or similarly licensed library to support I/O and class structures for query documents, query objects and features
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 25
![Page 26: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/26.jpg)
Next StepsNext Steps
Development of graphical input toolChemical Subgraph Editor, CSE
Publishing everything on the WebAnnounced via Newsletter / RSS feed
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 26
![Page 27: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/27.jpg)
CSRML SummaryCSRML – Summary
Universal and extensible platform for specifying advanced substructure queries
Connectivity (atoms, bonds, e--systems)Annotated query features and other propertiesLogical constructsLogical constructs
Open standard for easy exchange of substructure queriesOpen standard for easy exchange of substructure queries between different applications and databases
Encourage developers to use and distribute CSRML
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 27
![Page 28: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/28.jpg)
AcknowledgementsAcknowledgements
Molecular Networks (co-authors)Bruno Bienfait, Johann Gasteiger, Thomas Kleinöder,J M k Oli S h Al k T kh L th T fl thJoerg Marucszyk, Oliver Sacher, Aleksey Tarkhov, Lothar Terfloth
Chihae Yang (co-author)Chihae Yang (co-author)Discussions about the chemical subgraph definition
US FDA CFSANKirk Arvidson(Contract for development of the Chemical Subgraph Editor)
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 28
![Page 29: CSRML – A New Markup LanguageA New Markup Language ...bulletin.acscinf.org/PDFs/240nm77.pdf · ¾Single or multiple queries in a single document Storage in (XML) databases ¾Substructure](https://reader036.vdocument.in/reader036/viewer/2022062916/5eb787248eda5544877374f0/html5/thumbnails/29.jpg)
Thank You!Thank You!
www.molecular-networks.com
ACS Fall 2010 Meeting, Boston, MA, August 22-26, 2010 29