what is linked open data (lod)? a brief primer ecoinformatics international technical collaboration...

33
What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open Data and Environmental Information December 6 and 7, 2010 Bruce Bargmeyer & Kevin Keck Lawrence Berkeley National Laboratory Tel: +1 510-495-2905 [email protected] 1

Upload: audrey-dickerson

Post on 19-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

What is Linked Open Data (LOD)? A Brief Primer

Ecoinformatics International Technical Collaboration Partnership

International Web Meeting - Linked Open Data and Environmental Information

December 6 and 7, 2010

Bruce Bargmeyer & Kevin KeckLawrence Berkeley National LaboratoryTel: +1 [email protected]

1

Page 2: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Topics

What is LOD – the big picture Potential applications What is LOD – EPA examples

EPA data in LOD form Data descriptions in voID

Extensions to voID using ISO/IEC 11179

Linkage using ontology

Ecoinformatics challenges

2

Page 3: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Potential Applications

Environmental data UNEP, EEA, EPA data … Ecoinformatics Eye on Earth Summit, Abu Dhabi

Technology Infrastructure Workgroup

Health Energy …

3

Page 4: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

What is Linked Data?

A term Sir Tim Berners-Lee uses to describe HTTP-based Data Access for the Web

A linking mechanism for the Web that takes us from hypertext links (Document to Document) to hyperdata links

4

Page 5: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Web of Documents

Based on HTTP & HTML Analog: a global file system Primary objects

documents Hypertext links between documents (or sub-parts of)

Degree of structure in objects fairly low

Semantics of content and links implicit

Designed for human consumption

5

Page 6: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Web of Data

Based on Linked Data (HTTP, RDF, RDFa) Analog: a global database Primary objects

things (or descriptions of things) Links between things

Degree of structure in (descriptions of) things high

Semantics of content and links explicit

Needs metadata, e.g., Vocabulary of Interlinked Datasets (voiD) Designed for machines to better assist humans

6

Page 7: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

A Bit of a Distinction

Linked Data – a standards based approach—e.g., HTTP, RDF and RDFa—for making data available on the WWW

Open Data – The notion that data should be made openly available, with appropriate use license

Linked Open Data combines the two Governments in US and Europe in the lead

7

Page 8: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

8

Web Information Sharing between Data Creators and Data Users

Users Data Creators

UsersUsers

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text

ambienteagriculturatiemposalud hunanoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

ambienteagriculturatiemposalud hunoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038

3268082513485038270800002178

text data

Publish the Documents and Data

Page 9: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

9

Web Information SharingLots of Progress with Documents (HTTP & HTML)

Still Problems for Data

Users Data Creators

UsersUsers

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text

ambienteagriculturatiemposalud hunanoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

ambienteagriculturatiemposalud hunoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038

3268082513485038270800002178

text data

Data problems - a confusion of platforms, interfaces, file formats, …

Page 10: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

10

Web Information Sharing between Data Creators and Data Users

Users Data Creators

UsersUsers

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text

ambienteagriculturatiemposalud hunanoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

ambienteagriculturatiemposalud hunoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038

3268082513485038270800002178

text data

Publish the Data using standardsPublish the Metadata

Page 11: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

W3C, Web 2.0, Web 3.0View

Suppose Sir TBL gave a presentation at the EOE Summit – Likely topic: Linked Data (Linked Open Data) – spirited inspirational

presentation like he recently gave at TED and Gov 2.0 conferences.See: http://www.ted.com/index.php/search?q=berners+lee

Publishing Linked Data involves 3 basic steps1. Assign URIs to the entities described by the data set and provide for dereferencing these URIs over the HTTP protocol into RDF representations.2. Set RDF links to other data sources on the Web, so that clients can navigate the Web of Data as a whole by following RDF links.3. Provide metadata about published data, so that clients can assess the quality of published data and choose between different means of access.

11

Page 12: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Berners-Lee “five star system” for Linked Open Data

★ make your stuff available on the web (whatever format)

★★ make it available as structured data (e.g. excel instead of image scan of a table)

★★★ use non-proprietary format (e.g. csv instead of excel)

★★★★ use URLs to identify things, so that people can point at your stuff

★★★★★ link your data to other people’s data to provide context

Presented by TBL at TED and other conferencesFor examples and benefits of each star level see: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/

12

Page 13: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Linking Data

IntegratedTaxonomicInformationSystem (ITIS)

MytilusITIS TSN 79452

Gulf ofMaineData

NOAANationalBenthicIndicatorData

Databases that commit toITIS

For example13

Page 14: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Linking Data

IntegratedTaxonomicInformationSystem (ITIS)

National Center for Biotechnology Information (NCBI) Taxonomy

MytilusITIS TSN 79452

Gulf ofMaineData

NOAANationalBenthicIndicatorData

Databases that commit toITIS

More databases that map toITIS

14

Page 15: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

A Practical Example

Provided by Pasky Pascual, EPA Inspired by his article: “Evidence-based

decisions for the wiki world”, Pasky Pascual, International Journal of Metadata, Semantics and Ontologies (IJMSO) Volume 4 - Issue 4 – 2009 DOI: 10.1504/IJMSO.2009.029232

He provided data for the Gulf of Maine to LBNL

LBNL is using this to demonstrate environmental linked data and voiD files

15

Page 16: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Toxicity Data for Mainein Excel Format

Mytilus

16

Page 17: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Creation of LOD filesFrom Gulf of Main Toxicity Data Files (.xls)

Shared ontology:

@base <http://xmdr.org/ont/toxicity.owl> .@prefix obs: <http://xmdr.org/ont/observations.owl> .<> an owl:Ontology; owl:imports <http://xmdr.org/ont/observations.owl> .:Toxicity a owl:Class; owl:subClassOf obs:Observation .:species a owl:ObjectProperty; rdf:domain :Toxicity;

rdf:range <http://purl.bioontology.org/ontology/NCBI_NMO/Species> ....

RDF data:

@prefix tox: <http://xmdr.org/ont/toxicity.owl> ....<> an owl:Ontology; owl:imports <http://xmdr.org/ont/toxicity.owl> .<#Place-1> a geo:Point ; geo:lat 43.1 ; geo:long -70.77 ; rdf:label “Spinney Creek” .<#_2> a tox:Toxicity ; w5h:where <#Place-1> ; w5h:when “1985-04-16”^^xsd:Date ;

tox:Species ncbi:Mytilus ; rdf:Value -58 . 17

Page 18: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Transformation to LODFit into W5H Ontology

Observations are Events Who: collecting agency What: observable measured When, Where How: method/protocol use rdf:value for measurement

18

Page 19: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Creation of LOD filesFrom Gulf of Main Toxicity Data Files (.xls)

Where

When

What

Value

19

Page 20: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Mytilusin Integrated Taxonomic Information System (ITIS)

20

Page 21: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

NCBI Taxonomy Browser Shows Link between NCBI Taxonomy and ITIS

for Mytilus

21

Page 22: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Linking the Gulf of Maine Data to ITISTaxonomy.

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE rdf:RDF [ <!ENTITY ITIS "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&amp;search_value="> <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:w5h="http://samo.lbl.gov/ont/W5H.owl#" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <ToxicityScore rdf:nodeID="sheet1_row2"> <w5h:what><Genus rdf:resource="&ITIS;79452"> <rdfs:label>Mytilus</rdfs:label></Genus> </w5h:what> <w5h:where rdf:nodeID="station1"> <rdfs:label>Spinney Creek</rdfs:label> <geo:lat rdf:datatype="&xsd;decimal">43.1</geo:lat> <geo:long rdf:datatype="&xsd;decimal">-70.77</geo:long> </w5h:where> <w5h:when rdf:datatype="&xsd;date">1985-04-16</w5h:when> <rdf:value rdf:datatype="&xsd;int">-58</rdf:value> </ToxicityScore> <ToxicityScore rdf:nodeID="sheet1_row3"> <w5h:what rdf:resource="&ITIS;79452"> <w5h:where rdf:nodeID="station1"/> <w5h:when rdf:datatype="&xsd;date">1985-06-10</w5h:when> <rdf:value rdf:datatype="&xsd;int">197</rdf:value> </ToxicityScore></rdf:RDF> 22

Page 23: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Linking Pasky’s Gulf of Maine Data (Mytilus)To NOAA National Benthic Inventory Data

Through ITIS

23

Page 24: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Metadata for LOD: voiD

“In order to support clients in choosing the most efficient way to access Web data for the specific task they have to perform, data publishers can provide additional technical metadata about their data set and its interlinkage relationships with other data sets …. The Vocabulary Of Interlinked Datasets … defines terms and best practices to categorize and provide statistical metainformation about data sets as well as the linksets connecting them.”

-- Tim Berners-Lee, Massachusetts Institute of Technology, et al

voiD is a vocabulary and a set of instructions that enables the discovery and usage of linked datasets. A dataset is a collection of data, published and maintained by a single provider, available as RDF, and accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint. Based on the voiD vocabulary this document explains how to use voiD in a practical setup, for both data consumers and data providers. -- from voiD Guide

24

Page 25: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

voiD: vocabulary of interlinked Datasets

Motivation– Effective Dataset Selection– Efficient Discovery of Datasets, by search engines or data publishers– SPARQL query optimisation and query federation

• Two high-level concepts– Dataset: a dataset is published and maintained by a single provider and

accessible on the Web through de-referenceable URIs or a SPARQL endpoint

– Linkset: a subset of a void:Dataset; store triples to express the interlinking relationship between dataset

• voiD Vocabulary, http://rdfs.org/ns/void/html• voiD User's Guide, http://rdfs.org/ns/void-guide

Source: Kei Cheung, Yale Center for Medical Informatics25

Page 26: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

voiD File with Linkage StatisticsMaine Dataset Links to 29 Taxons in ITIS

@prefix void: <http://rdfs.org/ns/void#> .@prefix scovo: <http://purl.org/NET/scovo#> .@prefix : <#> .

<http://www.itis.gov/> a void:Dataset .:toxicity a void:Dataset ; void:vocabulary <http://samo.lbl.gov/ont/w5h.owl> ; void:subset :ME_toxicity .:ME_toxicity void:subset :ME_linkset .:ME_linkset void:linkPredicate <http://samo.lbl.gov/ont/w5h.owl#what> ; void:subjectsTarget :ME_toxicity ; void:objectsTarget <http://www.itis.gov/> ; void:statItem [ scovo:dimension void:numberOfDistinctObjects ; rdf:value 29 ] . 26

Page 27: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Dataset Description in voiD Format

:senselabontology a void:Dataset ; dcterms:title "SenseLab Neuron Ontology" ; dcterms:description "Neuroscience ontology derived from the SenseLab NeuronDB database."; dcterms:license <> ; # TODO foaf:homepage <http://neuroweb.med.yale.edu/senselab/> ; void:exampleResource <http://purl.org/science/owl/sciencecommons/identified_by_pmid> ; void:exampleResource <http://purl.org/ycmi/senselab/neuron_ontology.owl#has_Receptor> ; void:exampleResource <http://purl.org/ycmi/senselab/neuron_ontology.owl#NMDA> ; dcterms:creator :senselab ; ## this organization can be further defined dcterms:source <http://purl.org/ycmi/senselab/neuron_ontology.owl#> ; dcterms:subject <http://purl.org/ycmi/senselab/neuron_ontology.owl#Receptor> ; dcterms:subject <http://dbpedia.org/resource/Receptor_(biochemistry)> ; dcterms:subject <http://dbpedia.org/resource/Neurotransmitter_receptor> ; dcterms:subject <http://dbpedia.org/resource/Sensory_receptor> ; dcterms:source <doi:10.1093/bib/bbm018> ; void:feature :owl ; ## this technical feature can be further defined void:sparqlEndpoint <http://hcls.deri.org:8080/> ; void:vocabulary <http://www.obofoundry.org/ro/ro.owl> .

Source: Adapted from Kei Cheung, Yale Center for Medical Informatics

27

Page 28: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

voiD is Extensible

The voiD vocabulary is extensible. It may be useful to extend it as needed for

evaluating and documenting data for environmental decision making. E.g. EPA data standards ISO/IEC 11179 data descriptions

28

Page 29: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

voiD Deployment

Deploy a voiD file (in either Turtle, RDF/XML or RDFa format) onto the Web server

Make it accessible to search engines, such as Sindice (http://sindice.com/) Publish a Semantic Sitemap file (sitemap.xml) on the server

“...... allows Data publishers to state where documents containing RDF data are located, and to advertise alternative means to access it ......” [1] Use the datasetURI property in the sitemap.xml to point to the voiD

description of a dataset, e.g., http://neuroweb.med.yale.edu/senselab/senselab-void.ttl#senselabontology

[1] http://sw.deri.org/2007/07/sitemapextension/

Source: Kei Cheung, Yale Center for Medical Informatics

29

Page 30: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Some Questions/Challenges

Additional metadata for provenance W3C Provenance Vocabulary

Very similar to Open Provenance Model

ISO/IEC 11179 – can use to extend voiD Standard Measurement units ontology?

NIST? OMG? SWEET? OBO?

How to manage the terminologies Which Management tools for terminologies/ontologies

Open Ontology Repository BioPortal based implementation

30

Page 31: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Challenge: Managing Terminologies/Ontologies

31

Page 32: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Some Questions/Challenges

Standard RDF version of ITIS? Use LSIDs, or not?

Vocabulary for data collection purpose? Critical for determining comparability

Standardize on usage of W5H for datasets? Standard for numerical ranges (e.g., <44)?

c.f. GoodRelations ….

32

Page 33: What is Linked Open Data (LOD)? A Brief Primer Ecoinformatics International Technical Collaboration Partnership International Web Meeting - Linked Open

Acknowledgements

Kevin Keck, LBNLMark Musen, Natasha Noy, et al, StanfordPasky Pascual, EPAOthers as noted on slides

This material is based upon work supported by the National Science Foundation, under Grant No. 0637122, by USEPA and by DOE. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, DOE, or USEPA .

33