chemspider as a chemical term resolver

Post on 11-May-2015

306 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

In recent years, in parallel with the general broad trend of information proliferation, many tens of public chemical databases have been created and made available using internet technologies. In many cases fluent data exchange has occurred between these various databases as they source information from one another. While this has the advantages of linking together multiple data sources the results also include the proliferation of errors across the various databases. The lack of a public authority to resolve such errors significantly affects the quality of freely accessible chemical information. While ChemSpider has previously allowed a crowdsourcing approach to curation efforts have now migrated to addressing this problem using a "federated resolver" approach. This presentation will report on our work in this area.

TRANSCRIPT

ChemSpider as a Chemical Term Resolver

Antony Williams and Valery Tkachenko,

ACS San Diego March 2012

The Web of Chemistry – VERY BIG!

Online Databases are “Linking”

It is so difficult to navigate…

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in

pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project

Guiding principle is open access, open usage, open source- Key to standards adoption -

Guiding principle is open access, open usage, open source- Key to standards adoption -

What is the Structure of Vitamin K?

MeSH

A lipid cofactor that is required for normal blood clotting.

Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from

plants, VITAMIN K 2 (menaquinone) from bacteria, and

synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).

What is the Structure of Vitamin K1?

Create an Online “Resolver” as a path to chemistry Search all forms of structure IDs

Systematic name(s) Trivial Name(s) SMILES InChI Strings InChIKeys Database IDs Registry Number

ChemSpider

Available Information…

Linked to vendors, safety data, toxicity, metabolism

Available Information….

Vitamin K1 on ChemSpider CORRECT

Resolving Names for QUALITY

Searching chemical identifiers should resolve to the correct chemical as much as possible

Validated Name-Structure Dictionaries

Chemical name dictionaries are used for: Text-mining (publications, patents)

Used to index PubMed and link to Google Patents

Linking to other databases – think Biology! When structures are not available drug names link

Searching the web Names link to structures link to InChIs

I want to know about “Vincristine”

Vincristine: Identifiers

Vincristine: PatentsLinked by Name

Many Names, One Structure

Top 200 Drugs on Wikipediahttp://en.wikipedia.org/wiki/List_of_bestselling_drugs

The Project Challenge PART ONE

Agree on the set of chemical names to work with

Independently create an SDF file in each “lab”

Compare differences and agree on final structures

Issue “Gold Standard” SDF file to team

RSC Process

Relative accuracy of groups against final master list

The Project Challenge PART TWO

Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases

Two checks Search chemical name – does it return the

correct compound. If not correct, how is it different?

Search “structure” – SMILES, Molfile, InChIString or InChIKey

“The First 10”

Performance on 150 Drug Names

NPC Browser Set

Standardize

Use the SRS as a guidance document for standardization

Adjust as necessary to our needs

Nitro groups

Salt and Ionic Bonds

One dictionary look up is never enough…

ChemSpider does not contain all chemistry

We are not the only ones curating data

New chemistry expands daily and goes online

Federation is key….

Check ChemSpider first, if not found then Check PubChem Check NCI resolver Check ChEBI Check ….the “network” of open interfaces

Each resolver will have its own “quantitative confidence”.

One dictionary look up is never enough…

Chemical Identifier Resolver (CIR)

http://cactus.nci.nih.gov/chemical/structure

Converts a given structure identifier into another representation or structure identifier.

Resolve names, identifiers etc

What can become a resolver?

We are building….

A central federated resolver utilizing available services

Dictionary lookups, systematic name conversions (multiple tools – ACD/Labs, Lexichem, OPSIN)

“Consensus” decisions and guidance BUT Chemicals have timelines!!!

ORIGINAL FINAL

Thank you

Email: williamsa@rsc.org Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

top related