making semantics work in drug...

Post on 25-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Making semantics work in drug discovery

Indiana University School of Informatics and Computing

David Wild, Assistant Professor and Director, Cheminformatics & Chemogenomics Research Group (CCRG) Indiana University School of Informatics and Computing djwild@indiana.edu

“Information is cheap. Understanding is expensive” (Karl Fast)

http://djwild.info

“Stack” of applying semantics in drug discovery & healthcare

New biomedical insights

Integrated knowledge discovery processes

Integrative tools and algorithms

Accessible networks of semantically integrated data

Only now going

mainstream

Wild, D.J., Ding, Y., Sheth, A.P., Harland, L., Gifford, E.M., Lajiness, M.S. Systems Chemical Biology and the Semantic Web: what they mean for the future of drug discovery research, Drug Discovery Today, 2012, 17, 469-474.

Chem2Bio2RDF.org – semantically integrated data

Chen, B., Dong. X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., Wild, D.J. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics 2010, 11, 255.

We can answer many questions with SPARQL…

What pathways will troglitazone affect?

PREFIX c2b2r: <http://chem2bio2rdf.org/chem2bio2rdf.owl#> PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> select distinct ?pathwayName ?datasource from <http://chem2bio2rdf.org/owl#> where { ?chemical rdfs:label "Troglitazone"^^xsd:string; c2b2r:hasInteraction ?interaction . ?interaction c2b2r:hasTarget ?target ; c2b2r:biologicalInterest true . ?pathway c2b2r:isPathwayOf ?target ; bp:name ?pathwayName ; bp:xref [c2b2r:identifierType ?datasource] . }

Drug

Gene

Pathway

We can answer many questions with SPARQL…

What are possible multi-target MAPK Inhibitors?

PREFIX pubchem: <http://chem2bio2rdf.org/pubchem/resource/> PREFIX kegg: <http://chem2bio2rdf.org/kegg/resource/> PREFIX uniprot: <http://chem2bio2rdf.org/uniprot/resource/> SELECT ?compound_cid (count(?compound_cid) as ?active_assays) FROM <http://chem2bio2rdf.org/pubchem> FROM <http://chem2bio2rdf.org/kegg> FROM <http://chem2bio2rdf.org/uniprot> WHERE { ?bioassay pubchem:CID ?compound_cid . ?bioassay pubchem:outcome ?activity . FILTER (?activity=2) . ?bioassay pubchem:Score ?score . FILTER (?score>50) . ?bioassay pubchem:gi ?gi . ?uniprot uniprot:gi ?gi . ?pathway kegg:protein ?uniprot . ?pathway kegg:Pathway_name ?pathway_name . FILTER regex(?pathway_name,"MAPK signaling pathway","i") . } GROUP BY ?compound_cid HAVING (count(*)>1)

Bio-Assay

Gene

Pathway

Comp-ound

Gene

Variety of expert GUI tools for searching

“Stack” of applying semantics in drug discovery & healthcare

New biomedical insights

Integrated knowledge discovery processes

Integrative tools & algorithms

Accessible networks of semantically integrated data

Very little work done in this area

Wild, D.J., Ding, Y., Sheth, A.P., Harland, L., Gifford, E.M., Lajiness, M.S. Systems Chemical Biology and the Semantic Web: what they mean for the future of drug discovery research, Drug Discovery Today, 2012, 17, 469-474.

ChemBioSpace Association Search

He, B., Tang, J., Ding, Y., Wang, H., Sun, Y., Shin, J.H., Chen, B., Moorthy, G., Qiu, J., Desai, P., Wild, D.J., Mining relational paths in biomedical data. PloS One, 2011, 6(12), e27506.

Semantic Linked Association Prediction Association score: 2385.9 Association significance: 9.06 x 10-6 => missing link predicted

SLAP significant subgraph

Chen, B., Ding, Y., Wild, D.J. Assessing Drug Target Association using Semantic Linked Data. PLoS Computational Biology, 2012, 8(7), e1002574

Compound/Target SLAP Virtual Screen - Troglitazone

(C) 2012 DATA2DISCOVERY INC 11

SLAP Drug-Target Prediction Matrix

Bipartite repurposing graph created with Sci2

Assessing drug similarity from biological function �  Took 157 drugs with 10 known

therapeutic indications, and created SLAP profils against 1,683 human targets

�  Pearson correlation between profiles > 0.9 was used to create associations between drugs

�  Drugs with the same therapeutic indication unsurprisingly cluster together – also subcluster by MOA

�  Some drugs with similar profile have different indications – potential for use in drug repurposing?

Large-scale repurposing networks

(C) 2012 DATA2DISCOVERY INC 15

Repurposing example

(C) 2012 DATA2DISCOVERY INC 16

H1 antihistamine

anticonvulsant antiarrhythmic

anticonvulsant antidepressive

alpha / beta blocker used for CHF

“Stack” of applying semantics in drug discovery & healthcare

New biomedical insights

Integrated knowledge discovery processes

Integrative tools & algorithms

Accessible networks of semantically integrated data

What is the added value?

Wild, D.J., Ding, Y., Sheth, A.P., Harland, L., Gifford, E.M., Lajiness, M.S. Systems Chemical Biology and the Semantic Web: what they mean for the future of drug discovery research, Drug Discovery Today, 2012, 17, 469-474.

Integrative virtual screening �  Ligand-based screening: QSAR, similarity, pharmacophore �  Structure-based screening: Molecular docking

�  Semantic screening: Semantic association with targets and/or known ligands

�  Look at top hits using each method, and fused hits using harmonic data fusion of ranked lists

�  Currently being validated in PXR (Univ. Cincinnati) and Mtb (OSDD) projects)

Pharma-cophore  

Random Forest   ROCS   SLAP   Fusion  

Pharma-cophore 1   -0.12   -0.07   0.08   0.37  

Random Forest 1   0.04   -0.27   0.32  

ROCS   1   -0.17   0.37  

SLAP   1   0.28  

Fusion   1  

See Bioorg Med Chem Lett.2012 May 1;22(9):3349-53

MOA: Identifying cardiac side effects of Rosiglitazone

Gene/Drug Rosi-glitazone

Tro-glitazone

Pio-glitazone

SAA2 Strong “Discussed” PharmGKB

V. weak V. weak

APOE Strong “Discussed” PharmGKB + Matador

V. weak V. weak

ADIPOQ Strong Positive PharmGKB

V. weak Strong Positive PharmGKB

CYP2C8 Strong Changes metabolism (CTD)

V. Weak Strong Changes metabolism (CTD)

MOA: Identifying differential LDL-lowering effect of Troglitazone

Parkinson Disease-Inflammation Network

Integrating phenotypic assays

(C) 2012 DATA2DISCOVERY INC

22

Phenotypic Assay

Wnt pathway

Associated Assay

Resveratrol

Known assay data

Associated targets

Tool prototypes: djwild.info / d2discovery.com

Semantic Technologies in Drug Discovery

http://blog.project-sierra.de/archives/1639

�  Most commercial organizations are still in the early adopter phase: with a big data integration problem and realizing semantic technologies are a better solution to this than relational databases

�  Some companies are in “the bowling alley” and are moviong out of this phase. No-one is in “the tornado”

�  Research (OpenPHACTS, etc) is well on the way to solving the data integration problem and is moving on to advanced searching, data mining and prediction

Google Knowledge Graph

http://blogs.gartner.com/darin-stewart/2012/05/17/googles-knowledge-graph-yeah-thats-the-semantic-web-sort-of/

Lessons & thoughts: adoption of semantics �  Semantic search is going mainstream

�  Google Knowledge Graph, Facebook Graph Search, Linked Open Data (LOD), OpenPHACTS �  Now identified as “top technology trend” for 2013 (gartner.com/newsroom/id/2359715) �  Everyone has a big data integration problem. Semantics now work well for this. �  Many pharma companies have semantic pilot programs but no-one has gone “mainstream” with

semantics. However this is probably not far off. �  Switching from relational to semantic models seems revolutionary but can be done in an evolutionary

fashion (D2R, etc), although there are some capacity issues and limitations with this approach.

�  We need support for horizontal research in semantic prediction and data mining �  Based on huge hetergeneous graphs – application of existing and new graph algorithms �  Very little work has been done so far on semantic prediction using heterogeneous, semantic graphs –

most work is siloed in graph theory, data mining, communities

�  We need support for vertical research in big data / networks / semantics for translational medicine and drug discovery �  Semantic prediction, using all available data, shows strong promise for utility in areas such as drug-target

prediction, off-target profiling, and drug repurposing. �  Semantics might have rapid adoption in healthcare (EMR’s, PHR), and it should be important to be able to

integrate from the molecular to patient level. Need to keep good alignment between disciplines �  Academic-Industry cross-silo colloabration essential: EU OpenPHACTS good example of success

top related