chemspider and traveling the internet via chemical structures cheminformatics presentation

46
ChemSpider and Traveling the Internet via Chemical Structures Antony Williams Drexel University, November 2012

Upload: orcid-0000-0002-2668-4821

Post on 10-May-2015

2.032 views

Category:

Documents


0 download

DESCRIPTION

This is a short presentation given to chemistry students at Drexel University as a remote presentation. This was for the class of Jean-Claude Bradley.

TRANSCRIPT

Page 1: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

ChemSpider and Traveling the Internet via Chemical Structures

Antony WilliamsDrexel University, November 2012

Page 2: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Compounds and Identifiers

Page 3: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Chemistry on the Internet

Where do you source chemistry information? What can you trust online? How can you recognize potential issues? Cross-referencing and curating data

Page 4: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Molfiles (http://en.wikipedia.org/wiki/Chemical_table_file)

Page 5: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Molfiles 10 9 0 0 1 0 0 0 0 0 1 V2000 31.2937 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6526 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 31.2937 -7.7066 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -9.6877 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 25.5096 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 28.9731 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 27.8163 -9.7016 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6664 -7.7066 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 32.4367 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -11.0177 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3 1 2 0 0 0 0 4 1 1 0 0 0 0 9 1 1 0 0 0 0 7 2 1 0 0 0 0 5 2 2 0 0 0 0 8 2 1 0 0 0 0 6 4 1 0 0 0 0 4 10 1 6 0 0 0 7 6 1 0 0 0 0 M END

Page 6: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Molfiles Molfiles are the primary exchange format between

structure drawing packages Can be different between different drawing packages Most commonly carry X,Y coordinates for layout Can support polymers, organometallics, etc. Can carry 3D coordinates

Page 7: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

SMILES (http://en.wikipedia.org/wiki/SMILES)

SMILES is a common format Can support polymers,

organometallics, etc. Does NOT carry X,Y or Z

coordinates for layout so requires layout algorithms – can be problematic!

Generally different between drawing packages

Page 8: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Stereo

Page 9: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Tautomers

Page 10: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

SMILES ACD/Labs CC(C)CCC[C@@H](C)CCC[C@@H](C)CCCC(\

C)=C\CC2=C(C)C(=O)c1ccccc1C2=O

OpenEye CC1=C(C(=O)c2ccccc2C1=O)C/C=C(\C)/

CCC[C@H](C)CCC[C@H](C)CCCC(C)C

ChEMBL CC(C)CCC[C@@H](C)CCC[C@@H](C)CCC\

C(=C\CC1=C(C)C(=O)c2ccccc2C1=O)\C

Page 11: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

The InChI Identifier

Page 12: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

InChI

SINGLE code base managed by IUPAC – integrated into drawing packages. No variability as with SMILES

InChI Strings can be reversed to structures – same problem as with SMILES – no layout

Well adopted by the community (databases, publishers, blogs, Wikipedia) – good for searching the internet

Page 13: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

The InChI Standard

Page 14: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Tautomers – “Mobile H Perception”

Page 15: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Double Bond Orientation

Page 16: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Stereo

Page 17: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Checking for Stereochemistry

Page 18: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Checking for StereochemistryUse your drawing package!

Page 19: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Checking for Stereochemistry

Page 20: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Checking for Stereochemistry

Page 21: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Checking for Stereochemistry

Page 22: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

InChIKeysSearch the Web by Structure

Page 23: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

InChIs

Page 24: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Databases and Standardization

Page 25: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Databases and Standardization

Page 26: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

InChI

No support for polymers, organometallics

Many option settings can lead to variability and make integration across databases difficult – FixedH option especially problematic

“Slight” chance of collisions of InChIKeys

VERY USEFUL FOR INTEGRATING THE WEB

Page 27: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Vancomycin

Page 28: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Vancomycin

Search Molecular SKELETON

Search Full Molecule

Page 29: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Full Skeleton Search: 104 Hits

Page 30: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Full Molecule Search: 4 Hits

Page 31: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science

Page 32: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

www.chemspider.com

Page 33: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

How do we build it?

We deal in Molfiles or SDF files – with coordinates

Valence checking, charge imbalance

We have our own “business logic” to standardize

InChI to “aggregate tautomers” to one record

We link out to external sites using their IDs

Page 34: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Searches: The INTERNET

All ChemSpider and Internet searches are “simply algorithms” but synonym searching is based on an assertion

Page 35: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Validated Names for Searching…

Page 36: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Validating structures

Check for “full stereo” and use stereo descriptors especially for checking!

Check for quality of associated data sources

Check against reference literature when available – but it can be wrong

Question EVERYTHING!

Page 37: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Contributing to The Quality of DataWhat is the Structure of Vitamin K?

Page 38: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Contributing to The Quality of DataWhat is the Structure of Vitamin K?

A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K1 (phytomenadione) derived from plants, VITAMIN K2 (menaquinone) from bacteria & synthetic naphthoquinone provitamins, VITAMIN K3 (menadione).

Page 39: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

What is the Structure of Vitamin K1?

Page 40: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

CAS’s Common Chemistry

Page 41: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Wikipedia

Page 42: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Wolfram Alpha

Page 43: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

DailyMed

Page 44: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation
Page 45: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

ALL Different, ALL “Domoic Acids”

Page 46: ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics Presentation

Thank you

Email: [email protected] Twitter: ChemConnectorBlog: www.chemspider.com/blogPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams