ontology, rdf, sw for chemical structures

12
Ontology, RDF, SW for Ontology, RDF, SW for Chemical Structures Chemical Structures T N Bhat & J. Barkley T N Bhat & J. Barkley NIST NIST Publications Query tool Use Case [email protected]

Upload: libra

Post on 22-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Ontology, RDF, SW for Chemical Structures. T N Bhat & J. Barkley NIST. [email protected]. Query tool. Use Case. Publications. Major Features, Goal – to Reduce User Frustration. We have established a use case at the HCLS Website - Chemical taxonomies - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontology, RDF, SW for Chemical Structures

Ontology, RDF, SW for Ontology, RDF, SW for Chemical StructuresChemical Structures

T N Bhat & J. BarkleyT N Bhat & J. Barkley

NISTNIST

PublicationsQuery tool Use Case

[email protected]

Page 2: Ontology, RDF, SW for Chemical Structures

Major Features, Goal – to Reduce Major Features, Goal – to Reduce User FrustrationUser Frustration

We have established a use case at the HCLS We have established a use case at the HCLS Website - Chemical taxonomiesWebsite - Chemical taxonomiesCombining of Rule-based terms with Vocabulary-Combining of Rule-based terms with Vocabulary-based terms to define elements of RDFbased terms to define elements of RDFOrganization of the elements of RDF into Organization of the elements of RDF into predictable ontology using concepts from use predictable ontology using concepts from use casescasesDeveloping tools and techniques to present the Developing tools and techniques to present the information using familiar database environments information using familiar database environments – Allows easier portability and implementation of the Allows easier portability and implementation of the

information by the communityinformation by the community

Illustrating the concept using high profile data Illustrating the concept using high profile data such as for AIDS inhibitors and Protein Data Bank such as for AIDS inhibitors and Protein Data Bank contentscontents

Page 3: Ontology, RDF, SW for Chemical Structures

Combining of Rule-based with Vocabulary-Combining of Rule-based with Vocabulary-based elements to define RDFbased elements to define RDF

Chemical structures are definable by atomic Chemical structures are definable by atomic connectivity – thus structures are suitable for connectivity – thus structures are suitable for identification using graph theory – InChIidentification using graph theory – InChI– Suitable for machine reasoningSuitable for machine reasoning

Graphs are hard to digest for humans – therefore Graphs are hard to digest for humans – therefore proposal is to combine InChI with familiar proposal is to combine InChI with familiar vocabularies such as Ala, Phenyl, Adenine vocabularies such as Ala, Phenyl, Adenine – Also include synonyms in the vocabulary for greater Also include synonyms in the vocabulary for greater

coverage among diverse userscoverage among diverse users– Vocabularies make it easier for humans to recognize the Vocabularies make it easier for humans to recognize the

informationinformation

Page 4: Ontology, RDF, SW for Chemical Structures

InChI – a Scalable URIInChI – a Scalable URI

InChI is generated using a software InChI is generated using a software that decodes the chemical that decodes the chemical connectivity information in certain connectivity information in certain layers such as chirality, ring layers such as chirality, ring structure, atom type and then re-structure, atom type and then re-codes them to form a text stringcodes them to form a text string

InChI is a naming standard for InChI is a naming standard for chemicals recommended by IUPACchemicals recommended by IUPAC

Page 5: Ontology, RDF, SW for Chemical Structures

InChI – a rule-based URIInChI – a rule-based URIInChI InChI

– _1_2FC10H11NO2_2Fc11-_1_2FC10H11NO2_2Fc11-10_2812_2913-9-5-7-3-1-2-4-10_2812_2913-9-5-7-3-1-2-4-8_287_296-9_2Fh1-4_2C9H_2C5-8_287_296-9_2Fh1-4_2C9H_2C5-6H2_2C_28H2_2C11_2C12_296H2_2C_28H2_2C11_2C12_29

Page 6: Ontology, RDF, SW for Chemical Structures

Vocabulary-based DefinitionsVocabulary-based DefinitionsFor decades scientists have been developing names to identify For decades scientists have been developing names to identify structures and their imagesstructures and their images– Simple namesSimple names

HisHisAlaAlaDNADNAATPATP

– Semi-rule-based IUPAC namesSemi-rule-based IUPAC names2-amino-3-methylpentanamide2-amino-3-methylpentanamide 4-amino-3-hydroxy-6-methylheptanoic_acid4-amino-3-hydroxy-6-methylheptanoic_acid 1-[(Benzenesulfonyl-methyl-amino)-phenyl-butyl]-piperidin-4-yl}-1-[(Benzenesulfonyl-methyl-amino)-phenyl-butyl]-piperidin-4-yl}-propyl-carbamic acid, naphthalen-1-ylmethyl esterpropyl-carbamic acid, naphthalen-1-ylmethyl ester

Names facilitate text-based queries of desired componentsNames facilitate text-based queries of desired componentsNames when used together with InChI provide a smoother Names when used together with InChI provide a smoother integration of machine and human needsintegration of machine and human needs

Page 7: Ontology, RDF, SW for Chemical Structures

Use-Case for SW; Treatment for Use-Case for SW; Treatment for AIDS is a work in progressAIDS is a work in progress

Treatments for AIDS are of two typesTreatments for AIDS are of two types– Prevention – the most effectivePrevention – the most effective– ContainmentContainment

Drugs to contain, and reduce the viral loadDrugs to contain, and reduce the viral load– Majority of the drugs ( ~17) target either HIV Majority of the drugs ( ~17) target either HIV

protease or RTprotease or RT– Complete suppression of either of these viral Complete suppression of either of these viral

enzymes could cure AIDSenzymes could cure AIDS– But drug resistance leads only to partial But drug resistance leads only to partial

suppression of the enzymessuppression of the enzymes

All the drug design efforts for AIDS are All the drug design efforts for AIDS are based on structuresbased on structuresData needed for drug-design is scattered Data needed for drug-design is scattered over many Web resources and users often over many Web resources and users often wean through the data manuallywean through the data manuallyTherefore AIDS drug design is an ideal Therefore AIDS drug design is an ideal target for Semantic Web and novel new target for Semantic Web and novel new database related technologiesdatabase related technologiesSW connection between NIST and NIAID SW connection between NIST and NIAID AIDS databaseAIDS database

Choose the problem that matters

Website

Page 8: Ontology, RDF, SW for Chemical Structures

Annotation Technique/Developing Annotation Technique/Developing Structural OntologyStructural Ontology

Define compounds using chemical Define compounds using chemical features of interest to use casesfeatures of interest to use cases– Fragment, subgroup, classFragment, subgroup, class

1A8K000503 000505 030798

Page 9: Ontology, RDF, SW for Chemical Structures

Modeling with Protégé – Suitable Modeling with Protégé – Suitable for Text-based Ontologyfor Text-based Ontology

Page 10: Ontology, RDF, SW for Chemical Structures

Web toolsWeb toolsStructures are different from text based Structures are different from text based infoinfo– Structures are not amenable to text-based Structures are not amenable to text-based

query/rendering techniques query/rendering techniques – Majority of the structural users never heard Majority of the structural users never heard

(nor want to hear!) about SPARQL – query (nor want to hear!) about SPARQL – query language for RDFlanguage for RDF

– Commonly preferred/expected way to query is Commonly preferred/expected way to query is by ‘by ‘clickclick’’

Semantic Web for Structures needs new Semantic Web for Structures needs new Web tools that allow navigation by clicking Web tools that allow navigation by clicking on structural featureson structural features

Page 11: Ontology, RDF, SW for Chemical Structures

Chem-BLAST for Structural Semantic WebChem-BLAST for Structural Semantic Web

http://bioinfo.nist.gov/SemanticWeb_pr3d/chemblast.doPrasanna et al. PROTEINS 60, 1-4 (2005).Prasanna et al. PROTEINS 63(4), 907-917(2006). Download publications

Page 12: Ontology, RDF, SW for Chemical Structures

Future PlansFuture PlansExtend the work to chemical structures from Protein Data Bank Extend the work to chemical structures from Protein Data Bank

If interest exists hold a workshop at NIST Proposed dates - last If interest exists hold a workshop at NIST Proposed dates - last two weeks of March 2008two weeks of March 2008– Workshop will be in conjunction with the NIST wide Workshop will be in conjunction with the NIST wide

Ontology weekOntology week

Possible collaboration with IUPAC (International Union of Pure Possible collaboration with IUPAC (International Union of Pure and Applied Chemistry ) and ChEBIand Applied Chemistry ) and ChEBI– Contact: Colin Batchelor Contact: Colin Batchelor [email protected]@rsc.org

RSC Publishing,RSC Publishing,Royal Society of ChemistryRoyal Society of Chemistry

Community participation is essential for further Community participation is essential for further development development

Contact Contact [email protected]@nist.gov 301 975 5448 (US) 301 975 5448 (US)