opportunities in chemical structure standardization

18
Opportunities in Chemical Structure Standardization Valery Tkachenko Science Data Software, Rockville, USA Expanding IUPAC Standards for Chemical Inform EMBL-EBI Workshop, March 20-21 st 2017

Upload: valery-tkachenko

Post on 13-Apr-2017

61 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Opportunities in chemical structure standardization

Opportunities in Chemical Structure

StandardizationValery Tkachenko

Science Data Software, Rockville, USA

Expanding IUPAC Standards for Chemical InformationEMBL-EBI Workshop, March 20-21st 2017

Page 2: Opportunities in chemical structure standardization

DIKW workflow

Page 3: Opportunities in chemical structure standardization

Predictive data models & toolsExperimental Design

Data Analysis and

Modeling

Structured Nanomaterials

DataRepository

Data collection, curation, integration,

and structuring (ontology)

Literature data

Electronic Databases:

Analysis

Text Mining

Processing

Experimental Data

Disease

ExperimentalValidation

Feedback

, new

data

3

Effect

Decision support

Karmann Mills and Anthony HickeyRTI International, RTP, NC 27709andAlex TropshaEshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC 27599

Page 4: Opportunities in chemical structure standardization

Standards and authorities

Page 5: Opportunities in chemical structure standardization

We live in hyperconnected World

Page 6: Opportunities in chemical structure standardization

Data repositories

Page 7: Opportunities in chemical structure standardization

Fourches, Muratov, Tropsha. Nat Chem Biol. 2015,11(8):535.

How the problem is being solved now

Page 8: Opportunities in chemical structure standardization

[Very incomplete] list of common problems• Violation of chemical and common sense• Violations of valence bond theory• Unsupported format and chemical model features• Information loss during conversion• Tautomers• Stereochemical issues• Mixtures• Other classes of chemicals (materials, formulations, biologicals, structurally

diverse, etc)• Equivalence/mapping issues• Identifiers/names issues• Etc, etc, etc…

Page 9: Opportunities in chemical structure standardization

…problems (continued)• Multiple [historical, proprietary, shortcoming] formats

• ChemDraw, ChemSketch, AccelrysDraw• MOL, SDF• SMILES• Identifiers• Names and Synonyms

• Multiple toolkits/models• Open Source (alphabetical)

• CDK• RDKit• Indigo• OpenBabel• Etc…

• Commercial (alphabetical)• CACTVS• ChemAxon• OpenEye• Etc…

• Historical Hysterical software• No [machine-readable] standards• No authorities No coordinated efforts!!!

Page 10: Opportunities in chemical structure standardization

Solution• Agreed and machine-readable (digital) standards• Open-source (transparent) solution• Organizations AND community support and involvement• Accessible solution• Data triaging at data repositories level• Real-time validation/standardization (API, library, “docker”, etc)

Page 11: Opportunities in chemical structure standardization

11@gray_alasdair Big Data Integration

OpenPHACTS

Page 12: Opportunities in chemical structure standardization

OpenPHACTSChemistry Registry System (CRS)

Page 13: Opportunities in chemical structure standardization
Page 14: Opportunities in chemical structure standardization

OpenPHACTS CRS shortcomings…• Platform-dependent• Toolkit-dependent (potential licensing issues)• No deployable library• No [convenient] API

Page 15: Opportunities in chemical structure standardization

…OpenPHACTS CRS1 - ongoing work• Microsoft platform independent

• .NET Core, Python• Linux• NoSQL

• Toolkit independent• Indigo• RDKit (in progress)• CDK (planned)

• Docker image

• RESTful API

1 Was open-sourced and now supported by OpenPHACTS Foundation

Page 16: Opportunities in chemical structure standardization

CVSP on Jupyter

Page 17: Opportunities in chemical structure standardization
Page 18: Opportunities in chemical structure standardization

Meet the Team

Alexandru KorotcovData Science

Rick ZakharovTechnology

Valery TkachenkoSupport

Boris SattarovCheminformatics

Slides: https://www.slideshare.net/valerytkachenko16