big data supporting drug discovery - cautionary tales from the world of chemistry for translational...

65
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune, India February 3 rd 2014

Upload: valery-tkachenko

Post on 10-May-2015

511 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big Data Supporting Drug Discovery

Cautionary Tales from the World of Chemistry for Translational Informatics

Valery Tkachenko

RSC-CSIR/OSDD meeting

Pune, India

February 3rd 2014

Page 2: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 3: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics
Page 5: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 6: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Chemical space - 1060

Page 7: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Navigation in chemical space

Page 8: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Navigation in chemical space

Page 9: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 10: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Structure-based Drug Design

Page 11: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Structure-based Drug Design

Page 12: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Ligand-based Drug Design

Page 13: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Ligand-based Drug Design

Page 14: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 15: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Machine learning

Page 16: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Applied machine learning

Page 17: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 18: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

• ~30 million chemicals and growing

• Data sourced from >500 different sources

• Crowdsourced curation and annotation

• Ongoing deposition of data from our journals and our collaborators

• A structure centric hub for web-searching

Page 19: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider

Page 20: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider

Page 21: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Properties - experimental

Page 22: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Properties - ACDLabs

Page 23: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Properties – EPI Suite

Page 24: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Properties - ChemAxon

Page 25: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Literature references

Page 26: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Patents references

Page 27: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Books

Page 28: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Classification

Page 29: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Chemical vendors and datasources

Page 30: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Multimedia

Page 31: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 32: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider Reactions

Page 33: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider Reactions

Page 34: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider Reactions

Page 35: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider Reactions

Page 36: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider Spectra

Page 37: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider Spectra

Page 38: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider Databases

ChemSpider Compounds

ChemSpider Reactions

ChemSpider Spectra

ChemSpider Crystals

ChemSpider Materials

ChemSpider Assays

ChemSpider Algorithms

Page 39: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Research data inflow

Deposition Gateway

Staging databases

Compounds

Reactions

Spectra

Materials

Articles / CSSP

Compounds Module

Spectra Module

Reactions Module

Materials Module

TextminingModule

!͙Module

Web UI for unified depositions

DropBox, Google Drive, SkyDrive, etc

LabTrove and other templated data

Documents

API, FTP, etc

Raw data Validated dataStaging

databases

All databases are sliced by data sources/data

collections and have simple

security model where each data

slice/source is private, public or

embargoed

Page 40: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Research data outflow

Compounds Reactions Spectra Materials Documents

CompoundsAPI

ReactionsAPI

SpectraAPI

MaterialsAPI

DocumentsAPI

CompoundsWidgets

ReactionsWidgets

SpectraWidgets

MaterialsWidgets

DocumentsWidgets

Data tier

Data access tier

User interface

components tier

Analytical Laboratory application

User interface tier

(examples) Electronic Laboratory Notebook

Paid 3rd party integrations (various platforms – SharePoint, Google, etc)

Chemical Inventory application

Page 41: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 42: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

RSC Archive – since 1841

Page 43: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

DERA - Digitally Enabling RSC Archive

Page 44: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Semantic mark-up of articles

Page 45: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

It is so difficult to navigate…

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Page 46: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Data quality issue and CVSP

– Robochemistry

– Proliferation of errors in public and private databases

– Automated quality control system

Page 47: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

DrugBank dataset (6516 records)

J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10

DB06287

Page 48: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 49: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Research data management

University 1

Data Hub

Workstations

University 2

Data Hub

Workstations

Company 3

Data Hub

Workstations

Data Repositoryindexed storage

Data Repository provideddata storage

Chemically intelligent services

Indexes

Data

External clients Publishers

Scientists Funding bodies

Page 50: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 51: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Crowdsourcing

Page 52: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

AltMetrics

Page 53: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

RSC/Rewards and Recognition

Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.

The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.

Page 54: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementVisualization and navigationBuilding Global Chemistry Network

Page 55: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Visualization

Page 56: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Visualization and navigation

Page 58: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network

Page 59: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

We are a part of a larger world

Page 60: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

ChemSpider APIs

Page 61: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

National Chemistry Database

Page 62: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

http://www.openphacts.org

Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to

drug discovery in industry, academia and for small

businesses.

Semantic web is one of the corner stones

Page 63: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics
Page 64: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

OSDD

Page 65: Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Thank you

Email: [email protected]

Slides: http://www.slideshare.net/valerytkachenko16