the semantic web for life sciences

43

Upload: egon-willighagen

Post on 26-Jan-2015

111 views

Category:

Health & Medicine


9 download

DESCRIPTION

 

TRANSCRIPT

Page 1: The Semantic Web for Life Sciences

The Semantic Web for Life Sciences

Egon Willighagen <http://chem-bla-ics.blogspot.com/>

Bioclipse & Proteochemometric Group (Prof. J. Wikberg)Until 2010-09-30

Department of Pharmaceutical Biosciences (Prof. E. Brittebo)

Uppsala University

2010-09-22

Page 2: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Drug-Protein Binding

patented:ELW00356

binds to

uniprot:CYP1B1

2010-09-22 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 3: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Drug-Protein Binding

patented:ELW00356

binds to

uniprot:CYP1B1

Where was this published? How measured?Other measurements?What about similar molecules and proteins?What haplotype, SNP, or missense mutation?

2010-09-22 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 4: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Semantic Web

Current World Wide Web

hyperlinked web pages

Semantic Web

machine-readable hyperlinked (web) pages

2010-09-22 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 5: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Semantic Web 4 Life Sciences??

Is this relevant to drug discovery?

knowledge discovery, ...

data consistency checking

model validation

2010-09-22 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 6: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Applications

2010-09-22 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 7: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

OpenMolecules RDF: linked data

2010-09-22 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 8: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

OpenMolecules RDF

http://rdf.openmolecules.net/?InChI=1/CH4/h1H4

2010-09-22 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 9: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Linked Data

As of July 2009

LinkedCTReactome

Taxonomy

KEGG

PubMed

GeneID

Pfam

UniProt

OMIM

PDB

SymbolChEBI

Daily Med

Disea-some

CAS

HGNC

InterPro

Drug Bank

UniParc

UniRef

ProDom

PROSITE

Gene Ontology

HomoloGene

PubChem

MGI

UniSTS

GEOSpecies

Jamendo

BBCProgrammes

Music-brainz

Magna-tune

BBCLater +TOTP

SurgeRadio

MySpaceWrapper

Audio-Scrobbler

LinkedMDB

BBCJohnPeel

BBCPlaycount

Data

Gov-Track

US Census Data

riese

Geo-names

lingvoj

World Fact-book

Euro-stat

flickrwrappr

Open Calais

RevyuSIOCSites

Doap-space

Flickrexporter

FOAFprofiles

CrunchBase

Sem-Web-

Central

Open-Guides

Wiki-company

QDOS

Pub Guide

RDF ohloh

W3CWordNet

OpenCyc

UMBEL

Yago

DBpediaFreebase

Virtuoso Sponger

DBLPHannover

IRIT Toulouse

SWConference

Corpus

RDF Book Mashup

Project Guten-berg

DBLPBerlin

LAAS- CNRS

Buda-pestBME

IEEE

IBM

Resex

Pisa

New-castle

RAE 2001

CiteSeer

ACM

DBLP RKB

Explorer

eprints

LIBRIS

SemanticWeb.org

Eurécom

RKBECS

South-ampton

CORDIS

ReSIST ProjectWiki

NationalScience

Foundation

ECS South-ampton

LinkedGeoData

BBC Music

CC-BY-SA2010-09-22 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 10: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Linked Data: the Life Science corner

CC-BY-SA

2010-09-22 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 11: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Proteochemometrics

Data: protein sequences, molecular structures, binding a�nities

E.L. Willighagen et al., J. Biomed. Sem., 2010, in print

2010-09-22 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 12: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Proteochemometrics: RDF input

2010-09-22 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 13: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Substructure mining: ChEMBL

Annso�e Andersson, M.Sc. project2010-09-22 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 14: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

OpenTox

Open Standards around Computation Toxicology

Web services

Public Data Repository

Bioclipse integration

downloading/uploading data

run descriptor calculation

future: build QSAR models

E.L Willighagen, N. Jeliazkova, O. Spjuth, in preparation

2010-09-22 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 15: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

OpenTox: downloading

2010-09-22 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 16: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Hyperlinked Data

How do we put our semantic data online?

2010-09-22 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 17: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

XHTML+RDFa

Embedded in web pages

2010-09-22 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 18: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

SPARQL end point

Query the data directly

2010-09-22 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 19: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Semantic Wikis

Bootstrapping Life Sciences Knowledge Bases

Samuel Lampa et al., in preparation

2010-09-22 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 20: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Other Building Blocks

2010-09-22 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 21: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

The Chemistry Development Kit

A Family of Projects

CDK-Taverna (chemoinformatics work�ows)

JChemPaint (semantic 2D editor)

ChemoJava (GPL-ed extension)

Goals

library of cheminformatics algorithms

educational

Usage

CDK: 140+ times cited in scienti�c literature

Bioclipse, KNIME, CDK-Taverna, Jumbo (CML), AMBIT,...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003

C. Steinbeck et al., Curr.Pharm.Design, 2006

2010-09-22 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 22: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

More detail on the CDK

Tomorrow, during the presentation from 13:00-14:00

2010-09-22 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 23: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Bioclipse-RDF

Linking the Semantic Web to Cheminformatics

local RDF storage (memory, on disk)

read/write RDF/XML, N3

run SPARQL queries (local and remote)

extract RDF from XHTML/RDFa

Thanx to Open Source projects including Jena, SWI-Prolog,and Pellet.

E.L. Willighagen et al., J. BioMed. Sem., in press

O. Spjuth et al., BMC Bioinformatics 2007

O. Spjuth et al., BMC Bioinformatics 2010

2010-09-22 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 24: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

MyExperiment: Bioclipse ScriptingLanguage

myexperiment.search("RDF")

myexperiment.downloadWork�ow(937)

2010-09-22 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 25: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Semantic Web

The new building block...

It's already 10+ years old...

2010-09-22 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 26: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Why are Open Standards Important?

Standards: We speak the same langauge

Open: Social contract: you can use it, now and in thefuture

2010-09-22 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 27: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

The Semantic Web Stack

The Semantic Web is more than RDF

(From Wikipedia)

2010-09-22 Bioclipse & Proteochemometric Group - 27 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 28: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Resource Description Framework

hyperlinked, machine-readable knowledge

hyperlinked Universal Resource Identi�er (URI)

machine-readable knowledge markup with triple

2010-09-22 Bioclipse & Proteochemometric Group - 28 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 29: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

RDF: the URI

Universal Resource Identi�er (URI)

e.g. URL: http://www.pharmbio.org/

Too long? Use pre�xes

http://www.semanticweb.org/ontologies/cheminf.owl#Molecule

cheminf:Molecule

2010-09-22 Bioclipse & Proteochemometric Group - 29 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 30: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

RDF: the triple

Type 1: resource - predicate - resource

heavierthan

Ethane Methane

Type 2: resource - predicate - literal

boilingpoint

-161Methane

2010-09-22 Bioclipse & Proteochemometric Group - 30 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 31: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

RDF: graphs

Linked triples create a graph

heavierthan

Ethane Methane

boilingpoint

-161-89

boilingpoint

2010-09-22 Bioclipse & Proteochemometric Group - 31 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 32: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

RDF Schema & Web Ontology Language

RDF Schema: taxonomies

rdfs:Class, rdfs:Property

rdfs:label, rdfs:comment

rdfs:subClassOf, rdfs:subPropertyOf

Web Ontology Language (OWL)

owl:equivalentClass

owl:sameAs

and a lot more ...

2010-09-22 Bioclipse & Proteochemometric Group - 32 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 33: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

SPARQL

SPARQL RDF Query Language

SELECT DISTINCT * WHERE {

SERVICE <http://uu3.org:8888/7tm_receptors> {

?iuphar iface:family ?family .

?iuphar iface:code ?code .

?iuphar iface:iupharName ?iupharNm .

?human iface:iuphar ?iuphar .

?human iface:geneName "GABBR1" .

?human iface:entrezGene ?humanEntrez .

}

SERVICE <http://dbpedia.org/sparql> {

_:gene dbp:entrezgene ?humanEntrez ;

rdfs:label ?label ;

FILTER (lang(?label) = "en")

}

GRAPH <http://hcls.deri.org/atag/data/gabab_example.html> {

?topic rdfs:label ?label .

?post sioc:topic ?topic

}

}

2010-09-22 Bioclipse & Proteochemometric Group - 33 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 34: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

The Players

Who are working with RDF?

2010-09-22 Bioclipse & Proteochemometric Group - 34 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 35: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

W3C

World Wide Web Consortium

Coordinates standard development

Builds user communities

Health Care and Life Sciences Interest Group

Linked Open Drug Data (LODD)

Transitional Medican O? (TMO)

Scienti�c Discourse

2010-09-22 Bioclipse & Proteochemometric Group - 35 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 36: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Bio2RDF / Chem2Bio2RDF

Bio2RDF

Proteins, DNA, ...

Open Source

Canada, Virtuoso, ...

Chem2Bio2RDF

More towards molecules...

Indiana University

2010-09-22 Bioclipse & Proteochemometric Group - 36 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 37: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Talis & Virtuoso

The companies that provide triple store support.

Very supportive of Open initiatives...

2010-09-22 Bioclipse & Proteochemometric Group - 37 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 38: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

ACS Meeting

Boston, 22-23 August 2010

Topics lipidomics, text mining, drug discovery, ontologies

Thematic Issue submit before 2010-11-28

http://egonw.github.com/acsrdf2010/

2010-09-22 Bioclipse & Proteochemometric Group - 38 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 39: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Summary

RDF

New Open Standards for knowledge exchange

Simplies sharing data

RDFS + OWL

Machine-readable knowledge

Disambiguation

SPARQL, etc

Application Programming Interfaces

2010-09-22 Bioclipse & Proteochemometric Group - 39 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 40: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

What does this bring us?

Platform to integrate the RDF with the computation world

Bioclipse as single point of access

Scripting, sharing of scripts with MyExperiment.org

Bridging Names to Numbers

2010-09-22 Bioclipse & Proteochemometric Group - 40 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 41: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

Acknowledgements

Maris Lapins, Martin Eklund: statistics

Annso�e Andersson: ChEMBL + MoSS integration

Samuel Lampa: reasoning (Pellet/Prolog) and RDFIO

Nina Jeliazkova: OpenTox integration

John Overington: ChEMBL database

Ola Spjuth

2010-09-22 Bioclipse & Proteochemometric Group - 41 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 42: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

The Details: PharmBio course

http://www.pharmbio.org/

Book in preparation ...

2010-09-22 Bioclipse & Proteochemometric Group - 42 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 43: The Semantic Web for Life Sciences

Why RDF?

Applications

Online

BuildingBlocks

RDF

The Players

Conclusion

The Details: Molecular Chemometrics

http://www.citeulike.org/user/

egonw/tag/papers

http:

//chem-bla-ics.blogspot.com

http://egonw.github.com

waveto:

[email protected]

2010-09-22 Bioclipse & Proteochemometric Group - 43 - Egon Willighagen | chem-bla-ics.blogspot.com