rsc chemspider – building an internet based community for chemists

95
RSC ChemSpider – Building an Internet Based Community for Chemists

Upload: orcid-0000-0002-2668-4821

Post on 10-May-2015

1.120 views

Category:

Technology


3 download

DESCRIPTION

This is a general presentation about our efforts to build an internet based community for chemists using ChemSpider. A general overview of data quality online, crowdsourced deposition and curation and our progress to deliver a solution to the community for resourcing data.

TRANSCRIPT

Page 1: RSC ChemSpider – Building An Internet Based Community For Chemists

RSC ChemSpider – Building an Internet Based Community for Chemists

Page 2: RSC ChemSpider – Building An Internet Based Community For Chemists

Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science

Page 3: RSC ChemSpider – Building An Internet Based Community For Chemists

Chemistry on the Internet TODAY

Chemistry searches are generally limited to text-based searches across the internet

Poor quality and little curation/validation work

Too many searches required to resource data

Page 4: RSC ChemSpider – Building An Internet Based Community For Chemists

media.obsessable.com

As few interfaces as possible

What do humans want?

Page 5: RSC ChemSpider – Building An Internet Based Community For Chemists

Chemistry on the Internet FUTURE

Search by chemical structure and substructure

Chemistry articles indexed and searchable

Reduced number of searches to find data

Data are integrated – compounds, vendors, syntheses, data, publications and patents

Page 6: RSC ChemSpider – Building An Internet Based Community For Chemists

For Synthesis…TotallySynthetic.com

Page 7: RSC ChemSpider – Building An Internet Based Community For Chemists

Org Prep Daily (Blog)

Page 8: RSC ChemSpider – Building An Internet Based Community For Chemists

Lots of “Public Compound” Databases

PubChem Drugbank ChEBI/ChEMBL KEGG LipidMAPs ChemIDPlus eMolecules ZINC Lots of chemical vendors ChemSpider

Page 9: RSC ChemSpider – Building An Internet Based Community For Chemists

Where Would You look? What Do You Trust?

Page 10: RSC ChemSpider – Building An Internet Based Community For Chemists

Linked Data on the Web

Taken from: Rafael Sidis’ Blog

Page 11: RSC ChemSpider – Building An Internet Based Community For Chemists

What is a compound?

Page 12: RSC ChemSpider – Building An Internet Based Community For Chemists

What is ChemSpider?

ChemSpider is:

Building a Structure Centric Community for Chemists >23 million compounds, >300 data sources

A deposition and curation platform

A publishing platform for the community

Grows daily – more depositions, more links, more data sources

Page 13: RSC ChemSpider – Building An Internet Based Community For Chemists

How Was ChemSpider Built? ChemSpider was a “hobby project”

Housed in a basement and running off three servers – one bought, two built

Sensitive to weather and power stability

Went live at ACS Spring 2007 in Chicago

Page 14: RSC ChemSpider – Building An Internet Based Community For Chemists

Search Cholesterol

Page 15: RSC ChemSpider – Building An Internet Based Community For Chemists

Search Cholesterol

Page 16: RSC ChemSpider – Building An Internet Based Community For Chemists

Search Cholesterol

Page 17: RSC ChemSpider – Building An Internet Based Community For Chemists

Search Cholesterol

Page 18: RSC ChemSpider – Building An Internet Based Community For Chemists

Search Cholesterol

Page 19: RSC ChemSpider – Building An Internet Based Community For Chemists

Linked across the internet

Page 20: RSC ChemSpider – Building An Internet Based Community For Chemists

Kyoto Encyclopedia of Genes and Genomes

Page 21: RSC ChemSpider – Building An Internet Based Community For Chemists

Link off a structure in ChemSpider

Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 22: RSC ChemSpider – Building An Internet Based Community For Chemists

Links to Patents based on structure

Page 23: RSC ChemSpider – Building An Internet Based Community For Chemists

Clickthrough to Patents

Page 24: RSC ChemSpider – Building An Internet Based Community For Chemists

Articles Linked

Page 25: RSC ChemSpider – Building An Internet Based Community For Chemists

Answering Questions for Chemists Questions a chemist might ask…

What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?

Page 26: RSC ChemSpider – Building An Internet Based Community For Chemists

Complex Data and Information

Page 27: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider is a structure-centric hub

ChemSpider aggregates and links out across the internet

Data aggregate based on “structures and links”

What defines a chemical compound?

Page 28: RSC ChemSpider – Building An Internet Based Community For Chemists

What is a compound?

Page 29: RSC ChemSpider – Building An Internet Based Community For Chemists

Question Everything online: www.dhmo.org

Page 30: RSC ChemSpider – Building An Internet Based Community For Chemists

Di-Hydrogen Monoxide

2H

Page 31: RSC ChemSpider – Building An Internet Based Community For Chemists

Di-Hydrogen Monoxide

2H + 1O

Page 32: RSC ChemSpider – Building An Internet Based Community For Chemists

Di-Hydrogen Monoxide

H2O

Page 33: RSC ChemSpider – Building An Internet Based Community For Chemists

Di-Hydrogen Monoxide

H2OWater

Page 34: RSC ChemSpider – Building An Internet Based Community For Chemists

It’s all on Wikipedia…

Page 35: RSC ChemSpider – Building An Internet Based Community For Chemists

It’s all on Wikipedia…

Page 36: RSC ChemSpider – Building An Internet Based Community For Chemists

Chemistry on The Internet Is Messy

Page 37: RSC ChemSpider – Building An Internet Based Community For Chemists

It’s Methane…

Page 38: RSC ChemSpider – Building An Internet Based Community For Chemists

What’s Methane?

Page 39: RSC ChemSpider – Building An Internet Based Community For Chemists

What’s Methane?

Page 40: RSC ChemSpider – Building An Internet Based Community For Chemists

What ELSE is Methane???

Page 41: RSC ChemSpider – Building An Internet Based Community For Chemists

PubChem

Page 42: RSC ChemSpider – Building An Internet Based Community For Chemists

Truly “I Love You”

Page 43: RSC ChemSpider – Building An Internet Based Community For Chemists

Chemistry is REALLY Messy

Page 44: RSC ChemSpider – Building An Internet Based Community For Chemists

Vancomycin

Who will curate?

How would you clean such a large dataset?

Assertions!!!

Page 45: RSC ChemSpider – Building An Internet Based Community For Chemists

Vancomycin

Who will curate?

How would you clean such a large dataset?

Page 46: RSC ChemSpider – Building An Internet Based Community For Chemists

Vancomycin on ChemSpider 1 compound – 3 days

Page 47: RSC ChemSpider – Building An Internet Based Community For Chemists

The EXPERTS must get it right?!

Page 48: RSC ChemSpider – Building An Internet Based Community For Chemists

Wikipedia, C&E News, PubChem C&E News

(from ACS)

Page 49: RSC ChemSpider – Building An Internet Based Community For Chemists

What About Digitonin?

Page 50: RSC ChemSpider – Building An Internet Based Community For Chemists

CAS as an authority

Page 51: RSC ChemSpider – Building An Internet Based Community For Chemists

The Blogging Community Participate

Page 52: RSC ChemSpider – Building An Internet Based Community For Chemists

The FDA’s DailyMed

Page 53: RSC ChemSpider – Building An Internet Based Community For Chemists

Structures on DailyMed

Page 54: RSC ChemSpider – Building An Internet Based Community For Chemists

Lack of Stereochemisty

Page 55: RSC ChemSpider – Building An Internet Based Community For Chemists

Incorrect Structures

Page 56: RSC ChemSpider – Building An Internet Based Community For Chemists

Wow!

Page 57: RSC ChemSpider – Building An Internet Based Community For Chemists

The InChI Identifier

Page 58: RSC ChemSpider – Building An Internet Based Community For Chemists

Multiple Layers

Page 59: RSC ChemSpider – Building An Internet Based Community For Chemists

InChIStrings Hash to InChIKeys

Page 60: RSC ChemSpider – Building An Internet Based Community For Chemists

InChIs for Taxol

Page 61: RSC ChemSpider – Building An Internet Based Community For Chemists

Back to Taxol

DrugBank: RCINICONZNJXQF-CLDWUXIMDD

ChEBI: RCINICONZNJXQF-

GXKQXQCDDN Wikipedia: RCINICONZNJXQF-

MZXODVADBJ

Which one is correct???

Page 62: RSC ChemSpider – Building An Internet Based Community For Chemists

InChIKeys for Taxol

DrugBank: RCINICONZNJXQF-CLDWUXIMDD

ChEBI: RCINICONZNJXQF-

GXKQXQCDDN Wikipedia: RCINICONZNJXQF-

MZXODVADBJ

ChEBI and Wikipedia are the SAME structure

Drugbank is a DIFFERENT structure – ONE stereocenter

Page 63: RSC ChemSpider – Building An Internet Based Community For Chemists

Does one stereocenter matter?

Page 64: RSC ChemSpider – Building An Internet Based Community For Chemists

Does one stereocenter matter?

Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon

Page 65: RSC ChemSpider – Building An Internet Based Community For Chemists

Does one stereocenter matter?

Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon

Page 66: RSC ChemSpider – Building An Internet Based Community For Chemists

Building a Structure Centric Community for Chemists

Page 67: RSC ChemSpider – Building An Internet Based Community For Chemists

Assertion and Chemical Entities

Who says what Taxol is?

What is the “timeline” for a molecule?

How do we clean up the Public data?

The Quality source is Chemical Abstracts Service…

Page 68: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider Searches

Page 69: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider Searches

Page 70: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider Complex Searches

Page 71: RSC ChemSpider – Building An Internet Based Community For Chemists

Vancomycin – Search the Internet

Page 72: RSC ChemSpider – Building An Internet Based Community For Chemists

Full Molecule Search: 4 Hits

Page 73: RSC ChemSpider – Building An Internet Based Community For Chemists

Full Skeleton Search: 104 Hits

Page 74: RSC ChemSpider – Building An Internet Based Community For Chemists

The InChI “Resolver”

Page 75: RSC ChemSpider – Building An Internet Based Community For Chemists

Citizen Scientists

Page 76: RSC ChemSpider – Building An Internet Based Community For Chemists

Crowd-sourcing Chemistry Curation

Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Page 77: RSC ChemSpider – Building An Internet Based Community For Chemists

Building a Structure Centric Community for Chemists

Multi-level Curation and Approval

Page 78: RSC ChemSpider – Building An Internet Based Community For Chemists

Citizens as Data Sources

Page 79: RSC ChemSpider – Building An Internet Based Community For Chemists
Page 80: RSC ChemSpider – Building An Internet Based Community For Chemists

Entity-Extraction, Mark-up, Annotate

Page 81: RSC ChemSpider – Building An Internet Based Community For Chemists

Success Depends on Dictionaries

Page 82: RSC ChemSpider – Building An Internet Based Community For Chemists

Project Prospect

Page 83: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemMantis and CJOC

Page 84: RSC ChemSpider – Building An Internet Based Community For Chemists

Name-Structure Pairs

Page 85: RSC ChemSpider – Building An Internet Based Community For Chemists

Species – linked to Wikipedia

Page 86: RSC ChemSpider – Building An Internet Based Community For Chemists

Semantic Linking of Structures

What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 87: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider Everywhere

Linked from Wikipedia Linked from Open Notebook Science sites

using EMBED Linked from Blogs using Structure/Spectra

EMBED Integrated into structure drawing packages

such as ACD/ChemSketch, Symyx Draw, Open Source applets

Integrated to software offerings from Thermo, Waters, Agilent, Bruker

Page 88: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider Everywhere : Embed

Page 89: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider Everywhere:What do computers want?

Web services

flickr.com/photos/microcosmos

Page 90: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider Everywhere: Spectral Game

Page 91: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider EverywhereCrowdsourced Curation of Spectra

Page 92: RSC ChemSpider – Building An Internet Based Community For Chemists

ChemSpider EverywhereChemMobi

Page 93: RSC ChemSpider – Building An Internet Based Community For Chemists

There are always gaps...

What ChemSpider doesn’t deal with yet...

Markush structures and other “non-defineds” Materials Minerals Polymers Biological macromolecules

Page 94: RSC ChemSpider – Building An Internet Based Community For Chemists

What’s next?

Continue the curation effort and keep cleaning

Finish depositions – millions left to deposit

Layer on RDF to allow the semantic web to benefit from our efforts

Integrate RSC content – a massive archive!

Integrate RSC publishing workflows and databases

Page 95: RSC ChemSpider – Building An Internet Based Community For Chemists

Thank you

[email protected]: ChemSpidermanwww.chemspider.com/blogSLIDES: www.slideshare.net/AntonyWilliams