chemspider as a chemical term resolver

42
ChemSpider as a Chemical Term Resolver Antony Williams and Valery Tkachenko, ACS San Diego March 2012

Upload: royal-society-of-chemistry

Post on 11-May-2015

306 views

Category:

Technology


1 download

DESCRIPTION

In recent years, in parallel with the general broad trend of information proliferation, many tens of public chemical databases have been created and made available using internet technologies. In many cases fluent data exchange has occurred between these various databases as they source information from one another. While this has the advantages of linking together multiple data sources the results also include the proliferation of errors across the various databases. The lack of a public authority to resolve such errors significantly affects the quality of freely accessible chemical information. While ChemSpider has previously allowed a crowdsourcing approach to curation efforts have now migrated to addressing this problem using a "federated resolver" approach. This presentation will report on our work in this area.

TRANSCRIPT

Page 1: ChemSpider as a chemical term resolver

ChemSpider as a Chemical Term Resolver

Antony Williams and Valery Tkachenko,

ACS San Diego March 2012

Page 2: ChemSpider as a chemical term resolver

The Web of Chemistry – VERY BIG!

Page 3: ChemSpider as a chemical term resolver

Online Databases are “Linking”

Page 4: ChemSpider as a chemical term resolver

It is so difficult to navigate…

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Page 5: ChemSpider as a chemical term resolver

Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in

pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project

Guiding principle is open access, open usage, open source- Key to standards adoption -

Guiding principle is open access, open usage, open source- Key to standards adoption -

Page 6: ChemSpider as a chemical term resolver
Page 7: ChemSpider as a chemical term resolver

What is the Structure of Vitamin K?

Page 8: ChemSpider as a chemical term resolver

MeSH

A lipid cofactor that is required for normal blood clotting.

Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from

plants, VITAMIN K 2 (menaquinone) from bacteria, and

synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).

Page 9: ChemSpider as a chemical term resolver

What is the Structure of Vitamin K1?

Page 10: ChemSpider as a chemical term resolver
Page 11: ChemSpider as a chemical term resolver
Page 12: ChemSpider as a chemical term resolver

Create an Online “Resolver” as a path to chemistry Search all forms of structure IDs

Systematic name(s) Trivial Name(s) SMILES InChI Strings InChIKeys Database IDs Registry Number

Page 13: ChemSpider as a chemical term resolver

ChemSpider

Page 14: ChemSpider as a chemical term resolver

Available Information…

Linked to vendors, safety data, toxicity, metabolism

Page 15: ChemSpider as a chemical term resolver

Available Information….

Page 17: ChemSpider as a chemical term resolver

Vitamin K1 on ChemSpider CORRECT

Page 18: ChemSpider as a chemical term resolver

Resolving Names for QUALITY

Searching chemical identifiers should resolve to the correct chemical as much as possible

Page 19: ChemSpider as a chemical term resolver

Validated Name-Structure Dictionaries

Chemical name dictionaries are used for: Text-mining (publications, patents)

Used to index PubMed and link to Google Patents

Linking to other databases – think Biology! When structures are not available drug names link

Searching the web Names link to structures link to InChIs

Page 20: ChemSpider as a chemical term resolver

I want to know about “Vincristine”

Page 21: ChemSpider as a chemical term resolver

Vincristine: Identifiers

Page 22: ChemSpider as a chemical term resolver

Vincristine: PatentsLinked by Name

Page 23: ChemSpider as a chemical term resolver

Many Names, One Structure

Page 24: ChemSpider as a chemical term resolver

Top 200 Drugs on Wikipediahttp://en.wikipedia.org/wiki/List_of_bestselling_drugs

Page 25: ChemSpider as a chemical term resolver

The Project Challenge PART ONE

Agree on the set of chemical names to work with

Independently create an SDF file in each “lab”

Compare differences and agree on final structures

Issue “Gold Standard” SDF file to team

Page 26: ChemSpider as a chemical term resolver

RSC Process

Page 27: ChemSpider as a chemical term resolver

Relative accuracy of groups against final master list

Page 28: ChemSpider as a chemical term resolver

The Project Challenge PART TWO

Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases

Two checks Search chemical name – does it return the

correct compound. If not correct, how is it different?

Search “structure” – SMILES, Molfile, InChIString or InChIKey

Page 29: ChemSpider as a chemical term resolver

“The First 10”

Page 30: ChemSpider as a chemical term resolver

Performance on 150 Drug Names

Page 32: ChemSpider as a chemical term resolver

NPC Browser Set

Page 33: ChemSpider as a chemical term resolver

Standardize

Use the SRS as a guidance document for standardization

Adjust as necessary to our needs

Page 34: ChemSpider as a chemical term resolver

Nitro groups

Page 35: ChemSpider as a chemical term resolver

Salt and Ionic Bonds

Page 36: ChemSpider as a chemical term resolver

One dictionary look up is never enough…

ChemSpider does not contain all chemistry

We are not the only ones curating data

New chemistry expands daily and goes online

Page 37: ChemSpider as a chemical term resolver

Federation is key….

Check ChemSpider first, if not found then Check PubChem Check NCI resolver Check ChEBI Check ….the “network” of open interfaces

Each resolver will have its own “quantitative confidence”.

One dictionary look up is never enough…

Page 38: ChemSpider as a chemical term resolver

Chemical Identifier Resolver (CIR)

http://cactus.nci.nih.gov/chemical/structure

Converts a given structure identifier into another representation or structure identifier.

Resolve names, identifiers etc

Page 39: ChemSpider as a chemical term resolver

What can become a resolver?

Page 40: ChemSpider as a chemical term resolver

We are building….

A central federated resolver utilizing available services

Dictionary lookups, systematic name conversions (multiple tools – ACD/Labs, Lexichem, OPSIN)

“Consensus” decisions and guidance BUT Chemicals have timelines!!!

Page 41: ChemSpider as a chemical term resolver

ORIGINAL FINAL

Page 42: ChemSpider as a chemical term resolver

Thank you

Email: [email protected] Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams