connecting the dots: drug information and linked data
DESCRIPTION
Presented as part of the AMIA2014 Knowledge Representation + Semantics and Clinical Information Systems Working Groups Pre-Symposium "Drug Terminology Standards: Meaningful Use and Better Knowledge" November 16, 2014 Washington, DCTRANSCRIPT
Connec&ng the dots: drug informa&on and Linked Data
Tomasz Adamusiak MD PhD
7omasz
Conflict of interest disclosure
• Tomasz Adamusiak is a Senior Data Scien&st at Thomson Reuters, provider of intelligent informa&on for pharma and research ins&tu&ons
Tomasz Adamusiak MD PhD
• Former NLM Fellow and bioinforma&cian at EBI
Learning Objec&ves
• Describe Linked Data and and seman&c content integra&on technologies
• Recognize the value of integra&ng drug informa&on with public resources
AS OF 2012, ABOUT 2.5 EXABYTES OF DATA CREATED EACH DAY
2.5 exabytes ≈ 7 000 Libraries of Congress
By Carol M. Highsmith (Own work) [CC-‐BY-‐SA-‐3.0]
2.5 exabytes ≈ 7 000 Libraries of Congress
Tim Berners-‐Lee: the next Web of open, linked data
If you want to put something on the web there are three rules: 1. All kinds of conceptual things, they have names now that start with HTTP. 2. If I take one of these HTTP names and I look it up [...] I fetch the data using
the HTTP protocol from the web, I will get back some data in a standard format
3. It's got rela5onships [..] the other thing that it's related to is given one of those names that starts HTTP. So, I can go ahead and look that thing up.
Sir Tim Berners-‐Lee on the next Web (TED2009)
The 5 stars of open linked data
★ Pu`ng anything up there ★★ Machine readable format ★★★ Non-‐proprietary format ★★★★ Use URLs to iden&fy things ★★★★★ Provide context by linking to others
Gov 2.0 Expo 2010: Tim Berners-‐Lee, "Open, Linked Data for a Global Community” hdps://www.youtube.com/watch?v=ga1aSJXCFe0#t=328
RDF triple is the core concept underpinning the seman&c web
subject predicate object <hdp://www.example.com/index.html> <hdp://purl.org/dc/elements/1.1/creator> „John Smith”
example:index.html John Smith dc:creator
Several data sources available
Caveat 1: missing central URI reconcilia&on
• Responsibility for URIs: hdp://bio2rdf.org/mesh:68009154 hdp://bio2rdf.org/pubmed:11992264 hdp://bio2rdf.org/go:0016458 hdp://purl.org/obo/owl/GO#GO_0016458 • Versioning: hdp://sig.uw.edu/fma#Anatomical_en&ty (FMA 3.1) hdp://sig.biostr.washington.edu/fma3.0#Anatomical_en&ty (FMA 3.0) hdp://purl.obolibrary.org/obo/GO_0016458 (Foundry-‐compliant URI) • Requires insAtuAonal support • RxNorm in RDF?
Caveat 2: data locality
hdp://gigaom.com/broadband/the-‐storage-‐vs-‐bandwidth-‐debate/
CONNECTING THE DOTS
Given therapeutic action - PPAR gamma partial/agonist – what were the related compounds studied, the indications for treatment, technologies of drug delivery, related genes and affected pathways?
EBI RDF Plasorm • All model elements with annota&ons to acetylcholine-‐
gated channel complex (GO:0005892)
• Samples treated with alcohol
• Find drug-‐like (but currently not approved) molecules which bind 7TM1 GPCRs with high affinity
• Under what experimental condi&ons is Ensembl gene ENSG00000129991 (TNNI3) expressed?
• Pathways that reference Insulin (P01308)
• What are the preferred gene name and disease annota&ons of all human UniProt entries that are known to be involved in a disease?
★★★★★
Open PHACTS Discovery Plasorm
Freely available, pharmacological data from a variety of resources + tools and services to support pharmacological research
★★★★★
Bio2RDF: Linked Data for the Life Sciences
• ~11 billion triples across 35 datasets • Datasets include: clinicaltrials.gov, dbSNP, GenAge, GenDR, LSR, OrphaNet, PubMed, SIDER, WormBase
• Locally hosted endpoints: chembl, linkedSPL, pathwaycommons, reactome, wikipathways
★★★★★
NCBO BioPortal RDF
• Provide RDF for each class in BioPortal so that we can have a URL to a concept that resolves to a set of RDF triples that provide essen&al informa&on about the term
• Provide an RDF dump of each ontology in BioPortal to put them in a tripelstore to enable SPARQL access to the ontologies
★★★★★
Linked Structured Product Labels hdp://purl.org/net/linkedSPLs
• LinkedSPLs publishes all sec&ons of FDA-‐approved prescrip&on and over the counter drug package inserts from DailyMed for use by NLP and Seman&c Web researchers
• All ac&ve moie&es and product labels are mapped to RxNORM PURLs provided by the NCBO Bioportal SPARQL endpoint
• LinkedSPLs is provided as a service as part of the Drug Interac&on Knowledge Base (DIKB) project
Boyce RD et al. Dynamic enhancement of drug product labels to support drug safety, efficacy, and effecLveness. J Biomed SemanLcs. 2013 Jan 26;4(1):5. PMID: 23351881.
★★★★★
Making public FDA datasets more accessible
• Adverse events. FDA’s publically available drug adverse event and medica&on error reports, and medical device adverse event reports.
• Recalls. Enforcement report data, containing informa&on gathered from public no&ces about certain recalls of FDA-‐regulated products.
• Labeling. Structured Product Labeling (SPL) data for FDA-‐regulated human prescrip&on drug, OTC drug and biological product labeling.
★★★★
RDF Representa&on of CDISC Founda&onal Standards
• PhUSE and CDISC Draz RDF Representa&on • RDF could provide a founda&on for interoperable end to end data standards in clinical research
• hdp://github.com/phuse-‐org/rdf.cdisc.org
Thank You