the rsc e science - reflecting the change in the world we live in

Post on 10-May-2015

382 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The RSC & e-Science:Reflecting the Change in the World we Live In

Valery Tkachenko

RSC-OSDD Consultative Workshop on Cheminformatics

Delhi, September 28th 2013

Royal Society of Chemistry and Global Chemistry Network

The World we live in

Internet World20+ years into the Internet RevolutionWeb 2.0 -> Web 3.0

Connected WorldSocial NetworksReal-time Communications

Big Data WorldSemantic contentNew Interfaces

Pillars of the World

DataData (knowledge) is a KingDataflow

NavigationDomain-specific search and navigationNavigate inside and link out - federation

InterfacesHCI (human computer interface)M2M (machine to machine)

Science map

Chemical sciences map

Chemistry on the Internet

What’s wrong?!?!

Complexity

Royal Society of Chemistry and Global Chemistry Network

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

50000ft view at STM publisher

Knowledge

Our User Interfaces(Desktop, Web, Mobile, etc)

Customers

Delivery Magic

3rd party integrations(our web services)

ChemSpider Suite

Data Layer

ChemSpider Assays

ChemSpider Compounds

ChemSpider Reactions

ChemSpider Spectra

ChemSpider Materials

ChemSpider Algorithms

Business Objects Layer

CSAs BOCSC BO CSR BO CSS BO CSM BO CSA BO

APIs Layer

DS APIExport APISearch API Processing API

CSAs APICSC API CSR API CSS API CSM API CSA API

Components Layer

JS Components Google AppsComponents

Python widgets

SharePointComponents

PHP snippets

ASP.NET Components

UIs

ChemSpider website

ChemSpider Reactions

mobile web app

ChemSpider desktop app

Depositions client

Java Beans

• 29 million chemicals and growing

• Data sourced from >500 different sources

• Crowdsourced curation and annotation

• Ongoing deposition of data from our journals and our collaborators

• A structure centric hub for web-searching

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

ChemSpider and Atovaquone

Micropublishing

Micropublishing

Micropublishing

ChemSpider Reactions

ChemSpider Reactions

Knowledge in our own archives

DERA and Text Mining

The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6, thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer, thermometer and reflux condenser .

The reaction mixture was heated at reflux with stirring, for a period of about one-half hour .

After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue

Text Mining

The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser .

The reaction mixture was heated at reflux with stirring , for a period of about one-half hour .

After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue

It is so difficult to navigate…

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Digitally Enabling RSC Archive

Text, PDF, XML

Structures

Reactions

Spectra

Materials

Chemistry Validation andStandardization Platform

(CVSP)

DERA(Text Mining)

Biological Activities

Data quality issue and CVSP

Robochemistry

Proliferation of errors in public and private databases

Automated quality control system

ChemSpider issues

DrugBank dataset (6516 records)

~60 records that can’t be dearomatized unambiguously

DB04283 DB04462

~30 records with bonds that do not make sense

DB04283

DDB04009

2 records where Smiles, InChI, and name did not match the structure

DB00611 DB01547

~40 records where InChIs did not match the structure

DrugBank ID: DB00755InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13-20(18,4)5/h6,8-9,11-12,14H,7,10,13H2,1-5H3,(H,21,22)/b9-6+,12-11+,15-8+,16-14+

DruGBank ID: DB00614

DB08128

J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10

DB06287

7 records with 2 stereo bonds at chiral atoms

CVSP validation of ChEMBL 16 (~1.3 mln. records)

• Overall 0.7% of records had validation issues

• Stereo problems (~82%)• Directions of bonds do not make sense (~63%)• Ambiguous stereo : 2 stereo bonds at chiral center (~19%)

“Direction of bond makes no sense” – 63%

“Stereo types of the opposite bonds mismatch” -15%

http://www.iupac.org/publications/pac/2006/pdf/7810x1897.pdf

“Stereo types of non-opposite bonds match” – 2%

“atom not recognized” – 3% isotopes

Should be atom from periodic table

No mass difference in atom line

No “M ISO” in connection table

In molfile:

ChemSpider Suite

Data Layer

ChemSpider Assays

ChemSpider Compounds

ChemSpider Reactions

ChemSpider Spectra

ChemSpider Materials

ChemSpider Algorithms

Business Objects Layer

CSAs BOCSC BO CSR BO CSS BO CSM BO CSA BO

APIs Layer

DS APIExport APISearch API Processing API

CSAs APICSC API CSR API CSS API CSM API CSA API

Components Layer

JS Components Google AppsComponents

Python widgets

SharePointComponents

PHP snippets

ASP.NET Components

UIs

ChemSpider website

ChemSpider Reactions

mobile web app

ChemSpider desktop app

Depositions client

Java Beans

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

Started with 2 servers in a basement

Presently – two farms ~40 servers each

Future – in the Clouds

Compute intensive calculations

Delivery systems

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

AltMetrics

Curation in ChemSpider

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

Visualization

Navigation

ChemSpider APIs

We are a part of a larger world

National Chemistry Database

National Data Repository

University 1

Data Hub

Workstations

University 2

Data Hub

Workstations

Company 3

Data Hub

Workstations

Data Repositoryindexed storage

Data Repository provideddata storage

Chemically intelligent services

Indexes

Data

External clients Publishers

Scientists Funding bodies

http://www.openphacts.org

Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to

drug discovery in industry, academia and for small

businesses.

Semantic web is one of the corner stones

What does e-Science do in

?ChemSpider provides many of the physicochemical properties within the Open PHACTS Discovery Platform

e-Science develop tools to check and standardise chemical structures

e-Science is creating the Open PHACTS chemical registration system

RDF Export

Data:ChEMBLHMDB

DrugBankChemistry Validation and Standardization Platform (CVSP)

at cvsp.chemspider.com•Validation•Standardization•Parent generation•Run on Hadoop-based farm

We know about Natural Products

Marinlit

OSDD

The Global Chemistry Network

top related