2009 09 lod london
DESCRIPTION
Presentation about Linking Open Drug Data at the Linked Data Gathering in London 2009.TRANSCRIPT
Linked Data for Health Care and Life Science Research
Jun Zhao University of Oxford
Linking Open Drug Data (LODD)• A task force of the W3C Health Care Life Science
Interest Group, started since October 2008• Enrich the Web of Data by publishing drug-related and
as Linked Data• Investigate the benefits of LODD for drug discovery
and biomedical research• ~ 12 active participants, including researchers and
pharmas• Anja Jentzsch, Bosse Anderssen, Chris Bizer, Eric
Prud'hommeaux, Don Doherty, Matthias Samwald, Oktie Hassanzadeh, Scott Marshall, Susie Stephens
Dataset Content Publishing tool Triples
LinkedCT Derived from ClinicalTrials.gov; more than 60,000 trials conducted in the US and other countries
D2R Server 7,036, 000
DrugBank Nearly 5,000 FDA-approved small molecule and biotech drugs
D2R Server 767,000
DailyMed Published by National Library of Medicine (NLM); high quality packaging information on 4,300 marketed drugs
D2R Server 164, 300
RDF-TCM 850 herbs, herb-gene and herb-disease associations
Pubby 117, 600
Diseasome A network of disorders and disorder genes, obtained from Online Mendelian Inheritance in Man (OMIM)
D2R Server 91, 200
SIDER Information on 930 marketed drugs and 1,700 related side effects
D2R Server 192,500
8, 400, 000
Dataset Outgoing links
LinkedCT 220, 569
DrugBank 59, 661
DailyMed 38, 220
RDF-TCM 3, 438
Diseasome 31,065
SIDER 19, 281
Create linked data• Heterogeneous source data– Relational database dumps, tab-delimited data …
• Most data are open access• The toolkits are maturing– D2R Server and OpenLink Virtuos
• The difficulties– Understand the semantics of the source data– Heterogeneous semantics between source data
• We got to a long way without data integration or consensus of the semantics
Create links between data• Challenge: create links on a large scale• Silk– Mapping data by querying their SPARQL endpoints– Silk-LSL: Combining mappings rules, mapping
algorithms and matching thresholds• LinQuer– Semantic link discovery over relational data– LinQL: specify linkage requirements which are
rewritten into SQL queries• Sacrifice recall for precision• Maintain updates of links
Use case: connect medical knowledge
• Apart from the growth of studies in alternative medicines, they are yet included in standard medical care in western countries
• A lot of knowledge about alternative medicine is not available in English
• Use DBpedia to link together information• Create a Web of Data connecting alternative
medicine studies with western biomedical research
Applications• Patients:– Search for alternative medicine for a disease– Search for clinical trial information about a herb– Search for side effects information about a herb– Search for alternative herbs for a western drug, such
as what is the alternative medicine for aspirin• Researchers– Confirm these genes are also associated with the
disease in biomedical research
true
Are there any Raccoons in India?
The TOP pharma questions
• What patents exist for this pathway/target?• What side effects are there for this drug,
especially those not on the label?• Has a similar compound to ours been
approved previously, and what were the side effects?
• http://esw.w3.org/topic/HCLSIG/LODD/Questions
Future issues
• Links update notification• Annotate the data with the Translational
Medicine Ontology, also from W3C HCIS• More applications to support scientific
questions• Work with more datasets: such as gene
expression data, protein interaction network data
http://esw.w3.org/topic/HCLSIG/LODD/
Thank you!