chemspider reactions – delivering a free community resource of chemical syntheses

22
ChemSpider Reactions: Delivering a free community resource of chemical syntheses Valery Tkachenko, Colin Batchelor, Daniel Lowe, Ken Karapetyan, David Sharpe and Antony Williams ACS New Orleans April 2013

Upload: antony-williams-chemconnector-orcid-0000-0002-2668-4821

Post on 11-Jun-2015

838 views

Category:

Technology


3 download

DESCRIPTION

Presentation delivered by Colin Batchelor from the RSC eScience team at ACS New Orleans Spring Meeting April 2013. There are dozens of public compound databases now available online, some of these providing access to tens of millions of chemical compounds. However, very little effort has been put into the delivery of databases of chemical reactions with the majority of large resources being commercial in nature. In our five years of delivering chemical based data resources to the chemistry community one of the primary requests has been that chemists want to know how to synthesize many of the chemicals they are researching. This presentation will provide an overview of our concerted efforts to enhance access to freely available chemistry data and will discuss the ChemSpider Reactions as an integrating hub of content including data extracted from US patents, from RSC Journals and databases and from our micro-publishing platform ChemSpider Synthetic Pages (CSSP).

TRANSCRIPT

Page 1: ChemSpider reactions – delivering a free community resource of chemical syntheses

ChemSpider Reactions: Delivering a free community

resource of chemical syntheses

Valery Tkachenko, Colin Batchelor, Daniel Lowe, Ken Karapetyan, David Sharpe and Antony Williams

ACS New Orleans April 2013

Page 2: ChemSpider reactions – delivering a free community resource of chemical syntheses

Overview

• Motivation• The RSC and chemical reaction data• New sources of chemical reaction data• ChemSpider Reactions: bringing it all together• Experiments with reaction classification• The National Chemical Database Service

Page 3: ChemSpider reactions – delivering a free community resource of chemical syntheses

Who needs another reaction database?

• Those who cannot afford to license access…• Those who would like to access data that is

not abstracted• Those who might like to contribute data to a

database• Anybody wanting to integrate their systems in

and to pull data out.

Page 4: ChemSpider reactions – delivering a free community resource of chemical syntheses

RSC and chemical reaction data 1

Graphical abstracting journals:Methods in Organic Synthesis (monthly, 1990 to present)Catalysts and Catalysed Reactions (monthly, 2005 to present)

These constitute a backfile of over 50000 novel reactions

Page 5: ChemSpider reactions – delivering a free community resource of chemical syntheses

RSC and chemical reaction data 2

Page 6: ChemSpider reactions – delivering a free community resource of chemical syntheses

RSC and chemical reaction data 3

Page 7: ChemSpider reactions – delivering a free community resource of chemical syntheses

New sources of reaction data

Daniel Lowe’s PhD thesis (Cantab, 2012) was on extracting reactions from US patent data.We can apply this technology to the RSC Journal archive.

Page 8: ChemSpider reactions – delivering a free community resource of chemical syntheses

ChemSpider Reactionsbringing it all together

http://csr.dev.rsc-us.org/

WORK IN PROGRESS

Page 9: ChemSpider reactions – delivering a free community resource of chemical syntheses

Reaction classification 1

Project Prospect has text-mined RSC journal articles for named reactions and molecular processes, annotated according to Creative Commons-licensed ontologies:

See http://rxno.googlecode.com/

Page 10: ChemSpider reactions – delivering a free community resource of chemical syntheses

Reaction classification 2

Classification of Daniel’s US Patent data

Page 11: ChemSpider reactions – delivering a free community resource of chemical syntheses

Reaction InChI

To do for reactions what InChI has done for structures•Think online searching•Deduplication and linking

http://www-rinchi.ch.cam.ac.uk/help.html

Page 12: ChemSpider reactions – delivering a free community resource of chemical syntheses

Reaction InChIEarly work – RInChIs layered on to a few hundred thousand reactions •Not generated for a few 10s of thousands of reactions •Reaction deduplication results differ based on algorithm – GGA software versus RInChI•Under investigation

Page 13: ChemSpider reactions – delivering a free community resource of chemical syntheses

Other sources

ChemSpider SyntheticPages

•Electronic Lab Notebooks•University repositories

Please send theses

Page 14: ChemSpider reactions – delivering a free community resource of chemical syntheses

What will ChemSpider Reactions serve?

• Chemical Database Service• Linking back to original

publications/supplementary data• Underpinning other tools e.g. retrosynthetic

analysis (depends on data quality and mapping)

Page 15: ChemSpider reactions – delivering a free community resource of chemical syntheses

Chemical Database ServiceNational Chemical Database Service for UK academicsIntegrates commercial databases and servicesChemicals, analytical data, prediction algorithmsDevelopment of data repository

Page 16: ChemSpider reactions – delivering a free community resource of chemical syntheses

ARChem from SimBioSys 1

Synthesis planning tool which performs rule- and precedent-based retrosynthetic analysis back to commercially available starting materials.

Page 17: ChemSpider reactions – delivering a free community resource of chemical syntheses

ARChem from SimBiosys 2

Page 18: ChemSpider reactions – delivering a free community resource of chemical syntheses

ARChem from SimBioSys 3

Page 19: ChemSpider reactions – delivering a free community resource of chemical syntheses

But what about data quality?

• Data validation and curation required• Encouraging participation with

Rewards and RECOGNITION

Page 20: ChemSpider reactions – delivering a free community resource of chemical syntheses

Manual curation

• Integrated commenting, curating and validation platform across ALL eScience and Publishing platforms

• All integrated to a central RSC profile and feeding the alt-metrics tools

Page 21: ChemSpider reactions – delivering a free community resource of chemical syntheses

The other kind of RDF(made-up example)

Chemical reactions are unusually well-suited to representation. (Donald Davidson’s event semantics)

_:r1 a obo:RXNO_0000004 ; # Diels–Alder obo:has_participant_ceasing_to_exist _:m1 ;# a diene obo:has_participant_ceasing_to_exist _:m2 ;# an olefin obo:has_participant_starting_to_exist _:m3 .# a substituted cyclohexene_:m1 a <http://rdf.chemspider.com/233000> ._:m2 a <http://rdf.chemspider.com/233001> ._:m3 a <http://rdf.chemspider.com/233002> .

Page 22: ChemSpider reactions – delivering a free community resource of chemical syntheses

Questions?

E-mail: [email protected], [email protected]