ibex - access and exploit sar data from patents and journals · exploit sar data from patents and...

26
1 IBEX - access and exploit SAR data from patents and journals Péter Várkonyi, Christian Hoppe, Sorel Muresan AZ Global Compound Sciences Computational Chemistry Better Compounds. Faster Better Compounds. Faster. Overview GVKBIO database (context and content) IBEX application Examples of data mining Patent visualisation

Upload: dinhthuy

Post on 24-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

1

IBEX - access and exploit SAR data from patents and journals

Péter Várkonyi, Christian Hoppe, Sorel Muresan

AZ Global Compound SciencesComputational Chemistry

Better Compounds. Faster

Better Compounds. Faster.

Overview

• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation

2

Better Compounds. Faster.

Explicit Compound-to-Sequence Links• Increasing commercial and public availability of annotated relationships

…..document (or database entry) “W “ includes assay data “X” that defines compound “Y” as an activity modulator of protein “Z”…….

• provide crucial value in medicinal chemistry informatics

• Examples of commercial and public databases:

~ 2 million cpds ~ 3,200 sequences ~ 83,000 patents and papers

~ 130,000 cpds, ~1,300 sequences, ~7,000 papers

~ 4,000 cpds, 502 sequences

83 protein targets with bioassay data, and ~6,000 cpds in PDB structures

Better Compounds. Faster.

Venn-type Overlaps Highlight Unique Content

PubChem GVKBIO

4,150

86,143

34,674

353,623

3,162

WOMBAT

1,013,8486,825,265

7.27 mill

128 K

1.49 mill

Southan, C.; Varkonyi, P.; Muresan, S. Complementarity Between Public and Commercial Databases: New Opportunities in Medicinal Chemistry Informatics. Curr. Top. Med. Chem. 2007, 7, 1502-1508.

3

Better Compounds. Faster.

Overview

• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation

Better Compounds. Faster.

• A comprehensive database that captures explicit relationships between the three entities of publications, compounds and sequences.

• It includes over 3 million records corresponding to 2 million unique structures linked to 3200 sequences (GPCRs, Kinases, Proteases, NHRs, Ion-channels, Transporters and Phosphatases) extracted from ~37,000 patents and over 49,000 articles from 125 journals.

GVKBIO database - What is it?

4

Better Compounds. Faster.

• GVKBIO uses expert curators to populate databases with these explicit relationships extracted from journals and patents on a massive scale (unstructured -> structured data)

• They capture a substantial proportion of published compounds active against targets relevant to the pharmaceutical industry

• Data capture includes secondary assays, in vivo results and DMPKdata

• Human, mammalian, microbial and viral targets are included

GVKBIO database - General Information

Better Compounds. Faster.

• MedChem database (900,000 entries)– data from med chem journals– reference centric database

• 7 target class databases (2.6 million entries) – GPCRs, Proteases, Kinases, Ion-Channels, NHRs, Phosphatases,

Transporters– data from journals and patents– reference centric database

• Drug database - 3000 entries– All FDA approved compounds– Compound centric database

• Mechanism Based Toxicity Database (MBT)– Over 13000 drug & drug like compounds with details of toxicity, mechanism, adverse

effects, metabolism, toxicity data, toxic, derivatives and other information– Compound centric database

GVKBIO database - General Information

5

Better Compounds. Faster.

IBEX target class databases

0

200000

400000

600000

800000

1000000

1200000

MCD

OTHERPHO

TRANHR IC

PROTEASE

KINASE

GPCR

# en

trie

s

PapersPatents

GVKBIO database - Data

Better Compounds. Faster.

• Databases – Current Status (2008 March update)

GVKBIO database - Data

GVK_IDs Structures References Curated Patents All Patents Papers Official Symbols Activity

ALL 3202730 2063828 84163 34491 102017 49672 3233 9877083

GPCR 1155237 729433 22992 16117 49066 6875 745 2805495

IC 228423 148326 5992 3300 10408 2692 612 531826

KINASE 529190 318853 7590 5476 16288 2114 871 2146905

NHR 152411 103003 4275 2077 6834 2198 373 482061

PHO 29424 19301 722 322 807 400 159 61193

PROTEASE 471954 321747 9681 5615 14429 4066 504 1313144

TRA 94905 65585 2909 1726 5317 1183 545 250533

OTHER 9374 8224 113 113 182 0 15 13539

MCD 905406 662613 49670 0 0 49669 2636 3536511

Sum 3576324 2377085 103944 34746 103331 69197 6460 11141207

6

Better Compounds. Faster.

Overview

• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation

Better Compounds. Faster.

• Web application to access, search and export GVKBIO data

• Global and simple access to the information

• Centralised maintenance

• Searching individual databases or all data

IBEX - Overview

7

Better Compounds. Faster.

• GVKBIO – MedChem database and 7 target databases– Monthly updates (~40,000 records per month)– Links with other in-house systems– Included in-house derived data

• Descriptors • SMILES using internal chemistry business rules

IBEX – Data content

Better Compounds. Faster.

IBEX - Data scheme

GVK_IDGVK_ID

STR_ID

GVK_ID

Mec

hani

sm

JChe

m

Stru

ctur

e

GVK_ID

Act

ivity

REF_ID

Ref

eren

ce

GVK_ID

REF_ID

Map

ping

GVK_ID

DB_ID

Gvk

_Db

REF_ID

DB_ID

Ref

_Db

8

8

8

8

811

1

1

1

8

Better Compounds. Faster.

IBEX - Application Technology

• Server hardware– dedicated server - 2 Intel Dual Core XEON processor– Web server (Weblogic 9.2)

• Database– ORACLE 9.2 (AZ standard)– advanced users can have read only access to the tables

• Web interface– Java Servlet/JavaServer Pages

• Chemistry engine (structure storage and search)– ChemAxon's JChem 5.0.1

Better Compounds. Faster.

IBEX – Search interface

9

Better Compounds. Faster.

IBEX – Data presentation

Better Compounds. Faster.

GVK IDSTR IDCompany AddressCompound NameTitleAuthorsClaim/Example

GVK IDSTR IDCompany AddressCompound NameTitleAuthorsClaim/Example

IBEX – Search interface

10

Better Compounds. Faster.

IBEX – Search interface

Better Compounds. Faster.

IBEX – Search interface

11

Better Compounds. Faster.

IBEX – Search interface

Better Compounds. Faster.

IBEX – Search interface

12

Better Compounds. Faster.

IBEX – Search interface

Better Compounds. Faster.

IBEX – Presentation of results

13

Better Compounds. Faster.

IBEX – Presentation of results

Better Compounds. Faster.

Copy-and-paste the link to this compound in IBEX

Link to PubChem

IBEX – Presentation of results

14

Better Compounds. Faster.

Link to Pubmed

Link to the paper

IBEX – Presentation of results

Better Compounds. Faster.

Link to ENTREZLink to GeneNames

IBEX – Presentation of results

15

Better Compounds. Faster.

IBEX – Presentation of results

Better Compounds. Faster.

IBEX – Presentation of results

16

Better Compounds. Faster.

IBEX – Presentation of results

Better Compounds. Faster.

IBEX – Exporting results

17

Better Compounds. Faster.

IBEX – Exporting results

Better Compounds. Faster.

• Links to external applications– MicroPatent - patent name– PubChem - PubChem CID– GeneNames - official symbol– ENTREZ - locus ID

• Direct links to IBEX with– Structure ID– GVK ID– Patent name

IBEX - Application

18

Better Compounds. Faster.

• On screen– SDF file (only structure and GVK_ID, zipped)– CSV file (user selected fields, zipped)– XML file (user selected fields, zipped)

IBEX – Application output of results

Better Compounds. Faster.

• Extend functionalities– List searches– Save queries– More descriptors– Improved export

• Documentation and seminars

IBEX - Near Future

19

Better Compounds. Faster.

Overview

• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation

Better Compounds. Faster.

IBEX exploitation

• Rapid acces to current knowledge• Exploration of chemical and biological space• Selectivity and activity optimization• Develop predictive models (QSARs), build pharmacophores• Virtual screening, compound prioritization for HTS, compound

acquisition• Evaluate fast follower opportunities (patent busting)• Get structures from Patents / Journals

– avoid redrawing published structures (sdf and csv export)

20

Better Compounds. Faster.

IBEX - Novelty check

novel cmpds

comparelibraries

AZFilters

internalAZ cmpdsexternal

IBEXACESMDDR

PubChem

clean-upproposedlibrary

>30M cmpds

Better Compounds. Faster.

De-novo design with FRASSE

N

N

NH2

N

OSO

O

O

NH

O OH

N

Cl

O

O

O

OH

O

O

NH

N

N NH

O

AcetiamineAnalgesic

Enfenamic acidAntiinflammatory

AcemetacinAntipyretic

PYX 00001664

PYXIS Discovery (Smart Libraries) www.pyxis-discovery.comJMedChem 2003, 46, 4770; JMedChem 2004, 47, 5984; JCIM 2005, 45, 239

• Fragment and reassemble medchem cmpds (FRASSE)• Fragmenter from ChemAxon

21

Better Compounds. Faster.

Library design with LEADSCOPE

benzopyrazole, 1-(2-aminoethyl identify gaps

Custom library

Better Compounds. Faster.

Overview

• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation

22

Better Compounds. Faster.

Patent visualisation

• Chemical space visualisation

– Patent/Patent comparison• ChemGPS, Shapes, Pharmacophores

– Single Patent analysis• SAR analysis

– PipelinePilot (results can be imported to third party or in-house tools), SARVision

Better Compounds. Faster.

Patent visualisation - ChemGPS

• Global drugspace map

• 2 sets of compounds (423 cmpds in total)– Satellites (extreme real and virtual structures)– Cores (representative oral drugs)

• Map coordinates t-scores extracted via PCA using 72 physico-chemical descriptors

• 9 PCs

• Absolute position of compounds in the ChemGPS defined space

T. Oprea & J. Gottfries, J. Comb. Chem., 2001, 3, 157-166J. Larsson et al, J. Nat. Prod., 2007, 70, 789-794

23

Better Compounds. Faster.

Patent visualisation - ChemGPSSet of CoreSet of CoresSet of SatellitesSet of SatellitesCompoundsCompounds fromfrom GPCR GPCR patentspatents

hydrophobicity

rigidity

size

Better Compounds. Faster.

Patent visualisation - GPCR

• Example for GPCR patents– IBEX search -> Title: histamine, only WO patents, 9863

patents, selected 3 patents

AntagonistH1/H3

Glaxo24WO2007122156

Antagonist/reverse Agonist

H3Glaxo63US20070208005

AntagonistH3Pfizer71US20060019998

ModulationTargetCompany#compoundsPatent ID

24

Better Compounds. Faster.

Patent SAR visualisation - GPCR

• PP and SARVision: GPCR patent US20060019998– 1 core (oxadiazole-biphenyl)

Cores extracted by a PipelinePilot protocol with molecular framework clustering

In the header some statistics are shown

Compounds with associated information for each core

Better Compounds. Faster.

Patent SAR visualisation - GPCR

• Cores and compounds imported from PP results into SARVision (R-group analysis)

25

Better Compounds. Faster.

Patent SAR visualisation - GPCR

• PP and SARVision: GPCR patent WO2007122156– 1 core (oxo-pyridazine)

Better Compounds. Faster.

Patent SAR visualisation - GPCR• PP: GPCR patent US20070208005

– 1 core

26

Better Compounds. Faster.

Patent chemical space visualisation - GPCR• ChemGPS: Black: WO2007122156 (antagonist H1/H3, Glaxo); Red:US20060019998

(H3 antagonist, Pfizer); Blue:US20070208005 (H3 antagonist, Glaxo)

Better Compounds. Faster.

Acknowledgements

• Thierry Kogej• Plamen Petrov