2009 09 lod london

16
Linked Data for Health Care and Life Science Research Jun Zhao University of Oxford

Upload: jun-zhao

Post on 07-May-2015

721 views

Category:

Technology


1 download

DESCRIPTION

Presentation about Linking Open Drug Data at the Linked Data Gathering in London 2009.

TRANSCRIPT

Page 1: 2009 09 Lod London

Linked Data for Health Care and Life Science Research

Jun Zhao University of Oxford

Page 2: 2009 09 Lod London

Linking Open Drug Data (LODD)• A task force of the W3C Health Care Life Science

Interest Group, started since October 2008• Enrich the Web of Data by publishing drug-related and

as Linked Data• Investigate the benefits of LODD for drug discovery

and biomedical research• ~ 12 active participants, including researchers and

pharmas• Anja Jentzsch, Bosse Anderssen, Chris Bizer, Eric

Prud'hommeaux, Don Doherty, Matthias Samwald, Oktie Hassanzadeh, Scott Marshall, Susie Stephens

Page 3: 2009 09 Lod London

Dataset Content Publishing tool Triples

LinkedCT Derived from ClinicalTrials.gov; more than 60,000 trials conducted in the US and other countries

D2R Server 7,036, 000

DrugBank Nearly 5,000 FDA-approved small molecule and biotech drugs

D2R Server 767,000

DailyMed Published by National Library of Medicine (NLM); high quality packaging information on 4,300 marketed drugs

D2R Server 164, 300

RDF-TCM 850 herbs, herb-gene and herb-disease associations

Pubby 117, 600

Diseasome A network of disorders and disorder genes, obtained from Online Mendelian Inheritance in Man (OMIM)

D2R Server 91, 200

SIDER Information on 930 marketed drugs and 1,700 related side effects

D2R Server 192,500

8, 400, 000

Page 4: 2009 09 Lod London

Dataset Outgoing links

LinkedCT 220, 569

DrugBank 59, 661

DailyMed 38, 220

RDF-TCM 3, 438

Diseasome 31,065

SIDER 19, 281

Page 5: 2009 09 Lod London

Create linked data• Heterogeneous source data– Relational database dumps, tab-delimited data …

• Most data are open access• The toolkits are maturing– D2R Server and OpenLink Virtuos

• The difficulties– Understand the semantics of the source data– Heterogeneous semantics between source data

• We got to a long way without data integration or consensus of the semantics

Page 6: 2009 09 Lod London

Create links between data• Challenge: create links on a large scale• Silk– Mapping data by querying their SPARQL endpoints– Silk-LSL: Combining mappings rules, mapping

algorithms and matching thresholds• LinQuer– Semantic link discovery over relational data– LinQL: specify linkage requirements which are

rewritten into SQL queries• Sacrifice recall for precision• Maintain updates of links

Page 7: 2009 09 Lod London

Use case: connect medical knowledge

• Apart from the growth of studies in alternative medicines, they are yet included in standard medical care in western countries

• A lot of knowledge about alternative medicine is not available in English

• Use DBpedia to link together information• Create a Web of Data connecting alternative

medicine studies with western biomedical research

Page 8: 2009 09 Lod London

Applications• Patients:– Search for alternative medicine for a disease– Search for clinical trial information about a herb– Search for side effects information about a herb– Search for alternative herbs for a western drug, such

as what is the alternative medicine for aspirin• Researchers– Confirm these genes are also associated with the

disease in biomedical research

Page 9: 2009 09 Lod London
Page 10: 2009 09 Lod London
Page 11: 2009 09 Lod London

true

Page 12: 2009 09 Lod London
Page 13: 2009 09 Lod London

Are there any Raccoons in India?

Page 14: 2009 09 Lod London

The TOP pharma questions

• What patents exist for this pathway/target?• What side effects are there for this drug,

especially those not on the label?• Has a similar compound to ours been

approved previously, and what were the side effects?

• http://esw.w3.org/topic/HCLSIG/LODD/Questions

Page 15: 2009 09 Lod London

Future issues

• Links update notification• Annotate the data with the Translational

Medicine Ontology, also from W3C HCIS• More applications to support scientific

questions• Work with more datasets: such as gene

expression data, protein interaction network data

Page 16: 2009 09 Lod London

http://esw.w3.org/topic/HCLSIG/LODD/

Thank you!