ibex - access and exploit sar data from patents and journals · exploit sar data from patents and...
TRANSCRIPT
1
IBEX - access and exploit SAR data from patents and journals
Péter Várkonyi, Christian Hoppe, Sorel Muresan
AZ Global Compound SciencesComputational Chemistry
Better Compounds. Faster
Better Compounds. Faster.
Overview
• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation
2
Better Compounds. Faster.
Explicit Compound-to-Sequence Links• Increasing commercial and public availability of annotated relationships
…..document (or database entry) “W “ includes assay data “X” that defines compound “Y” as an activity modulator of protein “Z”…….
• provide crucial value in medicinal chemistry informatics
• Examples of commercial and public databases:
~ 2 million cpds ~ 3,200 sequences ~ 83,000 patents and papers
~ 130,000 cpds, ~1,300 sequences, ~7,000 papers
~ 4,000 cpds, 502 sequences
83 protein targets with bioassay data, and ~6,000 cpds in PDB structures
Better Compounds. Faster.
Venn-type Overlaps Highlight Unique Content
PubChem GVKBIO
4,150
86,143
34,674
353,623
3,162
WOMBAT
1,013,8486,825,265
7.27 mill
128 K
1.49 mill
Southan, C.; Varkonyi, P.; Muresan, S. Complementarity Between Public and Commercial Databases: New Opportunities in Medicinal Chemistry Informatics. Curr. Top. Med. Chem. 2007, 7, 1502-1508.
3
Better Compounds. Faster.
Overview
• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation
Better Compounds. Faster.
• A comprehensive database that captures explicit relationships between the three entities of publications, compounds and sequences.
• It includes over 3 million records corresponding to 2 million unique structures linked to 3200 sequences (GPCRs, Kinases, Proteases, NHRs, Ion-channels, Transporters and Phosphatases) extracted from ~37,000 patents and over 49,000 articles from 125 journals.
GVKBIO database - What is it?
4
Better Compounds. Faster.
• GVKBIO uses expert curators to populate databases with these explicit relationships extracted from journals and patents on a massive scale (unstructured -> structured data)
• They capture a substantial proportion of published compounds active against targets relevant to the pharmaceutical industry
• Data capture includes secondary assays, in vivo results and DMPKdata
• Human, mammalian, microbial and viral targets are included
GVKBIO database - General Information
Better Compounds. Faster.
• MedChem database (900,000 entries)– data from med chem journals– reference centric database
• 7 target class databases (2.6 million entries) – GPCRs, Proteases, Kinases, Ion-Channels, NHRs, Phosphatases,
Transporters– data from journals and patents– reference centric database
• Drug database - 3000 entries– All FDA approved compounds– Compound centric database
• Mechanism Based Toxicity Database (MBT)– Over 13000 drug & drug like compounds with details of toxicity, mechanism, adverse
effects, metabolism, toxicity data, toxic, derivatives and other information– Compound centric database
GVKBIO database - General Information
5
Better Compounds. Faster.
IBEX target class databases
0
200000
400000
600000
800000
1000000
1200000
MCD
OTHERPHO
TRANHR IC
PROTEASE
KINASE
GPCR
# en
trie
s
PapersPatents
GVKBIO database - Data
Better Compounds. Faster.
• Databases – Current Status (2008 March update)
GVKBIO database - Data
GVK_IDs Structures References Curated Patents All Patents Papers Official Symbols Activity
ALL 3202730 2063828 84163 34491 102017 49672 3233 9877083
GPCR 1155237 729433 22992 16117 49066 6875 745 2805495
IC 228423 148326 5992 3300 10408 2692 612 531826
KINASE 529190 318853 7590 5476 16288 2114 871 2146905
NHR 152411 103003 4275 2077 6834 2198 373 482061
PHO 29424 19301 722 322 807 400 159 61193
PROTEASE 471954 321747 9681 5615 14429 4066 504 1313144
TRA 94905 65585 2909 1726 5317 1183 545 250533
OTHER 9374 8224 113 113 182 0 15 13539
MCD 905406 662613 49670 0 0 49669 2636 3536511
Sum 3576324 2377085 103944 34746 103331 69197 6460 11141207
6
Better Compounds. Faster.
Overview
• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation
Better Compounds. Faster.
• Web application to access, search and export GVKBIO data
• Global and simple access to the information
• Centralised maintenance
• Searching individual databases or all data
IBEX - Overview
7
Better Compounds. Faster.
• GVKBIO – MedChem database and 7 target databases– Monthly updates (~40,000 records per month)– Links with other in-house systems– Included in-house derived data
• Descriptors • SMILES using internal chemistry business rules
IBEX – Data content
Better Compounds. Faster.
IBEX - Data scheme
GVK_IDGVK_ID
STR_ID
GVK_ID
Mec
hani
sm
JChe
m
Stru
ctur
e
GVK_ID
Act
ivity
REF_ID
Ref
eren
ce
GVK_ID
REF_ID
Map
ping
GVK_ID
DB_ID
Gvk
_Db
REF_ID
DB_ID
Ref
_Db
8
8
8
8
811
1
1
1
8
Better Compounds. Faster.
IBEX - Application Technology
• Server hardware– dedicated server - 2 Intel Dual Core XEON processor– Web server (Weblogic 9.2)
• Database– ORACLE 9.2 (AZ standard)– advanced users can have read only access to the tables
• Web interface– Java Servlet/JavaServer Pages
• Chemistry engine (structure storage and search)– ChemAxon's JChem 5.0.1
Better Compounds. Faster.
IBEX – Search interface
9
Better Compounds. Faster.
IBEX – Data presentation
Better Compounds. Faster.
GVK IDSTR IDCompany AddressCompound NameTitleAuthorsClaim/Example
GVK IDSTR IDCompany AddressCompound NameTitleAuthorsClaim/Example
IBEX – Search interface
10
Better Compounds. Faster.
IBEX – Search interface
Better Compounds. Faster.
IBEX – Search interface
11
Better Compounds. Faster.
IBEX – Search interface
Better Compounds. Faster.
IBEX – Search interface
12
Better Compounds. Faster.
IBEX – Search interface
Better Compounds. Faster.
IBEX – Presentation of results
13
Better Compounds. Faster.
IBEX – Presentation of results
Better Compounds. Faster.
Copy-and-paste the link to this compound in IBEX
Link to PubChem
IBEX – Presentation of results
14
Better Compounds. Faster.
Link to Pubmed
Link to the paper
IBEX – Presentation of results
Better Compounds. Faster.
Link to ENTREZLink to GeneNames
IBEX – Presentation of results
15
Better Compounds. Faster.
IBEX – Presentation of results
Better Compounds. Faster.
IBEX – Presentation of results
16
Better Compounds. Faster.
IBEX – Presentation of results
Better Compounds. Faster.
IBEX – Exporting results
17
Better Compounds. Faster.
IBEX – Exporting results
Better Compounds. Faster.
• Links to external applications– MicroPatent - patent name– PubChem - PubChem CID– GeneNames - official symbol– ENTREZ - locus ID
• Direct links to IBEX with– Structure ID– GVK ID– Patent name
IBEX - Application
18
Better Compounds. Faster.
• On screen– SDF file (only structure and GVK_ID, zipped)– CSV file (user selected fields, zipped)– XML file (user selected fields, zipped)
IBEX – Application output of results
Better Compounds. Faster.
• Extend functionalities– List searches– Save queries– More descriptors– Improved export
• Documentation and seminars
IBEX - Near Future
19
Better Compounds. Faster.
Overview
• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation
Better Compounds. Faster.
IBEX exploitation
• Rapid acces to current knowledge• Exploration of chemical and biological space• Selectivity and activity optimization• Develop predictive models (QSARs), build pharmacophores• Virtual screening, compound prioritization for HTS, compound
acquisition• Evaluate fast follower opportunities (patent busting)• Get structures from Patents / Journals
– avoid redrawing published structures (sdf and csv export)
20
Better Compounds. Faster.
IBEX - Novelty check
novel cmpds
comparelibraries
AZFilters
internalAZ cmpdsexternal
IBEXACESMDDR
PubChem
clean-upproposedlibrary
>30M cmpds
Better Compounds. Faster.
De-novo design with FRASSE
N
N
NH2
N
OSO
O
O
NH
O OH
N
Cl
O
O
O
OH
O
O
NH
N
N NH
O
AcetiamineAnalgesic
Enfenamic acidAntiinflammatory
AcemetacinAntipyretic
PYX 00001664
PYXIS Discovery (Smart Libraries) www.pyxis-discovery.comJMedChem 2003, 46, 4770; JMedChem 2004, 47, 5984; JCIM 2005, 45, 239
• Fragment and reassemble medchem cmpds (FRASSE)• Fragmenter from ChemAxon
21
Better Compounds. Faster.
Library design with LEADSCOPE
benzopyrazole, 1-(2-aminoethyl identify gaps
Custom library
Better Compounds. Faster.
Overview
• GVKBIO database (context and content)• IBEX application• Examples of data mining• Patent visualisation
22
Better Compounds. Faster.
Patent visualisation
• Chemical space visualisation
– Patent/Patent comparison• ChemGPS, Shapes, Pharmacophores
– Single Patent analysis• SAR analysis
– PipelinePilot (results can be imported to third party or in-house tools), SARVision
Better Compounds. Faster.
Patent visualisation - ChemGPS
• Global drugspace map
• 2 sets of compounds (423 cmpds in total)– Satellites (extreme real and virtual structures)– Cores (representative oral drugs)
• Map coordinates t-scores extracted via PCA using 72 physico-chemical descriptors
• 9 PCs
• Absolute position of compounds in the ChemGPS defined space
T. Oprea & J. Gottfries, J. Comb. Chem., 2001, 3, 157-166J. Larsson et al, J. Nat. Prod., 2007, 70, 789-794
23
Better Compounds. Faster.
Patent visualisation - ChemGPSSet of CoreSet of CoresSet of SatellitesSet of SatellitesCompoundsCompounds fromfrom GPCR GPCR patentspatents
hydrophobicity
rigidity
size
Better Compounds. Faster.
Patent visualisation - GPCR
• Example for GPCR patents– IBEX search -> Title: histamine, only WO patents, 9863
patents, selected 3 patents
AntagonistH1/H3
Glaxo24WO2007122156
Antagonist/reverse Agonist
H3Glaxo63US20070208005
AntagonistH3Pfizer71US20060019998
ModulationTargetCompany#compoundsPatent ID
24
Better Compounds. Faster.
Patent SAR visualisation - GPCR
• PP and SARVision: GPCR patent US20060019998– 1 core (oxadiazole-biphenyl)
Cores extracted by a PipelinePilot protocol with molecular framework clustering
In the header some statistics are shown
Compounds with associated information for each core
Better Compounds. Faster.
Patent SAR visualisation - GPCR
• Cores and compounds imported from PP results into SARVision (R-group analysis)
25
Better Compounds. Faster.
Patent SAR visualisation - GPCR
• PP and SARVision: GPCR patent WO2007122156– 1 core (oxo-pyridazine)
Better Compounds. Faster.
Patent SAR visualisation - GPCR• PP: GPCR patent US20070208005
– 1 core