publishing "5 star" data: the case for rdf

37
Peter Winstanley: Holyrood Magazine Open Data Scotland: 10 December 2013

Upload: peter-winstanley

Post on 22-Nov-2014

2.265 views

Category:

Business


0 download

DESCRIPTION

In the Open Data world we are encouraged to try to publish our data as “5-star” Linked Data because of the semantic richness and ease of integration that the RDF model offers. For many people and organisations this is a new world and some learning and experimenting is required in order to gain the necessary skills and experience to fully exploit this way of working with data.  This workshop will re-assert the case for RDF and provide a guided tour of some examples of RDF publication that can act as a guide to those making a first venture into the field.

TRANSCRIPT

Page 1: Publishing "5 star" data: the case for RDF

Peter Winstanley: Holyrood Magazine Open Data Scotland: 10 December 2013

Page 2: Publishing "5 star" data: the case for RDF

Semantic Issues

Application Integration

Total Effort

http://www.opengroup.org/subjectareas/si

Page 3: Publishing "5 star" data: the case for RDF

Resource Description FrameworkRDF.

• Initially a way of adding metadata to XML• Subject-Predicate-Object or • Subject-Predicate-Literal triples

Scotland has an Authority that is Aberdeen City

Scotland Aberdeen City

“218,220”

PopulationAuthority

Aberdeen City has a Population with value “218,220”

Page 4: Publishing "5 star" data: the case for RDF

E T L

Extraction, Transformation and Loading

Page 5: Publishing "5 star" data: the case for RDF

“One often overlooked advantage that RDF offers is its deceptively simple data model. This data model trivializes merging of data from multiple sources and does it in such a way that data about the same things gets collated and deduplicated. In my opinion this is the most important benefit of using RDF over other open data formats.” (Ian Davis, 2011)http://blog.iandavis.com/2011/08/18/the-real-challenge-for-rdf-is-yet-to-come/

Page 6: Publishing "5 star" data: the case for RDF

“Bonnet”

http://place.org/Paris

http://ex.org/lives

http:

//ex

.org

/nam

e

http://ex.org/owns

http://ex.org/pet/2“Sasha”

http://ex.org/name

A resource … with the name “Bonnet” …. living in Paris owns …Pet 2 … that is called Sasha

Page 7: Publishing "5 star" data: the case for RDF

http://ex.org/pet/2

“ferret”

http://ex.org/sp

ecies

“chicken”htt

p://

ex.o

rg/fa

vour

iteFo

od

Pet 2 … is a ferret… and has chicken as the favourite food

Page 8: Publishing "5 star" data: the case for RDF

“Bonnet”

http://place.org/Paris

http://ex.org/lives

http:

//ex

.org

/nam

e

http://ex.org/owns

http://ex.org/pet/2“Sasha”

http://ex.org/name

http://ex.org/pet/2

“ferret”

http://ex.org/sp

ecies

“chicken”

http:

//ex

.org

/favo

urite

Food

The two references tohttp://ex.org/pet/2 point tothe same resource so thegraphs can merge.

Page 9: Publishing "5 star" data: the case for RDF

Fact Table Data Cube

EDW

"Schema up front" design

Change is costly

enterprise data warehouse

Page 10: Publishing "5 star" data: the case for RDF

RDF triplestores:

• promiscuous

• schema-independent

In contrast...

Page 11: Publishing "5 star" data: the case for RDF

What dothe joins mean?

http://www.torkiljohnsen.com/wp-content/uploads/2010/07/joomla_1.6_database_schema.png

Page 12: Publishing "5 star" data: the case for RDF

RDF has explicit semantics.

In contrast...

http://ilrt.org/discovery/2001/01/rdf-thes/

Page 13: Publishing "5 star" data: the case for RDF

RDF - Gross Morphology of Network

Page 14: Publishing "5 star" data: the case for RDF

So RDF data is “5 star” because

No need for prior design discussion with data suppliers about data specification.

No need to design container before accepting data.

Datasets are self-describing. Explicit semantics.

Data deduplicates and collates automatically

Merged datasets are collated and de-duplicated automatically.

Page 15: Publishing "5 star" data: the case for RDF

The Quick Tour

1. Ed/training2. Creation3. Storage4. Publishing5. Use

Page 16: Publishing "5 star" data: the case for RDF

Education/training at scale

• The Euclid Project - http://www.euclid-project.eu/

• Module 1: Introduction and Application Scenarios• Module 2: Querying Linked Data• Module 3: Providing Linked Data• Module 4: Interaction with Linked Data• Module 5: Creating Linked Data Applications• Module 6: Scaling up

Page 17: Publishing "5 star" data: the case for RDF

Creating RDF.

Hand written, or scripted• http://aksw.org/Projects/Xturtle.html • http://jena.apache.org/ • http://www.openrdf.org/ • http://www.rdflib.net/ • http://rdf.rubyforge.org/ • http://librdf.org/

Page 18: Publishing "5 star" data: the case for RDF

Creating RDF..

GUI

http://openrefine.org/

Plugins to output RDF

'Reconciliation' services available

Page 19: Publishing "5 star" data: the case for RDF

Creating RDF...

Wikis

• Use Wikipedia and let DBPedia work for you

• Semantic Mediawiki - outputs RDF and can be linked to triplestore directly

• Drupal and DBPedia - creates RDFa which can be scraped, - not very widely used.

Page 20: Publishing "5 star" data: the case for RDF

Creating RDF....

Relational to RDF mapping

• D2R Server: Accessing databases with SPARQL and as Linked Data– http://opendata.tellmescotland.gov.uk

• Virtuoso RDF Views– http://location.testproject.eu/BEL/

Page 21: Publishing "5 star" data: the case for RDF

Large In-Memory Triplestores?

Page 22: Publishing "5 star" data: the case for RDF

Native RDF Triplestores.• Apache Jena "TDB"• Used in ....

Page 23: Publishing "5 star" data: the case for RDF

Native RDF Triplestores..

• 4Store• Sesame• Mulgara• Bigdata

All provide SPARQL over HTTP, and native APIs

Page 24: Publishing "5 star" data: the case for RDF

Geospatial Triplestores

• Virtuoso Universal Server (7.0, ColumnStore edition)

• Parliament (2.7.4 quickstart)

• uSeekM (1.2.0-a5, on top of PostgreSQL 8.4 and PostGIS 1.5)

• OWLIM-SE (Trial version 5.3.5849)

• Strabon (3.2.3, on top of PostgreSQL 8.4 and PostGIS 1.5)

Xen VMs for each available in Debian 6http://blog.geoknow.eu/virtual-machines-of-geospatial-rdf-stores/ Dr. Jens Lehmann. Uni Leipzig

Page 25: Publishing "5 star" data: the case for RDF

Linked Data API.

PublishMyData Linked Data API

Page 26: Publishing "5 star" data: the case for RDF

Linked Data API..

ELDA Linked Data API

Page 27: Publishing "5 star" data: the case for RDF

Linked Data API...

Entity Resolution:

Victoria Quay is http://cofog01.data.scotland.gov.uk/id/facility/AB0103

...resolves http://cofog01.data.scotland.gov.uk/doc/facility/AB0103

Page 28: Publishing "5 star" data: the case for RDF

Linked Data API....

Different serialisations [JSON, NT, RDF/XML etc]

HTTP "Accept" headers - e.g. "application/json"

303 re-directshttp://cofog01.data.scotland.gov.uk/id/facility/AB0103.nt http://cofog01.data.scotland.gov.uk/doc/facility/AB0103.rdf

Page 29: Publishing "5 star" data: the case for RDF

Linked Data API.....• SPARQL is for experts

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX sepaw: <http://data.sepa.org.uk/def/water/>PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX sepaloc: <http://data.sepa.org.uk/def/location/>PREFIX sepaw: <http://data.sepa.org.uk/def/water/>CONSTRUCT {?item sepaw:waterBodyId ?___0 .?item sepaw:wiseCode ?___1 .?item sepaw:inRiverBasinDistrict ?___2 .?___2 rdfs:label ?___3 .?item geo:lat ?___4 .?item sepaw:category ?___5 .?item sepaw:inSubBasinDistrict ?___6 .?___6 rdfs:label ?___7 .?item rdfs:label ?___8 .?item sepaw:lengthKm ?___9 .?item sepaw:currentOverallClassification ?___10 .?item sepaloc:unitaryAuthority ?___11 .?item geo:long ?___12 .?item sepaw:inCatchment ?___13 .?___13 rdfs:label ?___14 .?item sepaw:currentClassificationYear ?___15 .?item sepaloc:postcodeDistrict ?___16 .?item sepaw:areaSqKm ?___17 .

}

WHERE { {SELECT ?item WHERE { ?item rdf:type sepaw:SurfaceWaterBody . } OFFSET 0 LIMIT 10}{ ?item sepaw:waterBodyId ?___0 . } UNION { ?item sepaw:wiseCode ?___1 . } UNION {{ ?item sepaw:inRiverBasinDistrict ?___2 . } OPTIONAL { { ?___2 rdfs:label ?___3 . } }}UNION { ?item geo:lat ?___4 . } UNION { ?item sepaw:category ?___5 . } UNION {{ ?item sepaw:inSubBasinDistrict ?___6 . } OPTIONAL { { ?___6 rdfs:label ?___7 . } }}UNION { ?item rdfs:label ?___8 . } UNION { ?item sepaw:lengthKm ?___9 . } UNION { ?item sepaw:currentOverallClassification ?___10 . } UNION { ?item sepaloc:unitaryAuthority ?___11 . } UNION { ?item geo:long ?___12 . } UNION {{ ?item sepaw:inCatchment ?___13 . } OPTIONAL { { ?___13 rdfs:label ?___14 . } }}UNION { ?item sepaw:currentClassificationYear ?___15 . } UNION { ?item sepaloc:postcodeDistrict ?___16 . } UNION { ?item sepaw:areaSqKm ?___17 . }

}

Page 30: Publishing "5 star" data: the case for RDF

Linked Data API....

• Linked Data API makes it easy

http://data.sepa.org.uk/doc/water/surfacewaters

http://data.sepa.org.uk/doc/water/surfacewaters.xml

Page 31: Publishing "5 star" data: the case for RDF

FluidOps Workbench & FedX

• Built on top of Sesame RDF store• Wiki-like structure for interaction• Data pipelined in from external SPARQL and

other sources• Includes widgets, graph views, facet views etc

for interacting with the aggregated data

Page 32: Publishing "5 star" data: the case for RDF

What RDF data is "out there" already?

Page 33: Publishing "5 star" data: the case for RDF
Page 34: Publishing "5 star" data: the case for RDF

September 2013http://en.wikipedia.org/wiki/DBpedia

DBPedia - at the heart of Open Data

45 million interlinks withFreebaseOpenCycUMBELGeoNames,Musicbrainz,CIA World Fact BookDBLPProject GutenbergDBtune JamendoEurostatUniprotBio2RDFUS Census data

Also used inThomson Reuters OpenCalaisNew York Times Linked Open DataZemanta API DBpedia Spotlight BBC datasets

Page 35: Publishing "5 star" data: the case for RDF
Page 36: Publishing "5 star" data: the case for RDF

Quick Test• http://localhost:3030/sparql-editor.tpl

SELECT ?country ?country_name ?capital ?pop ?p ?x ?q ?w WHERE { SERVICE <http://dbpedia.org/sparql/sparql> { ?country a type:LandlockedCountries ; rdfs:label ?country_name ; prop:populationEstimate ?pop; prop:capital ?capital . FILTER ( lang(?country_name) = 'en' ) } SERVICE <http://worldbank.270a.info/sparql> {optional { ?p ?x ?country. ?p ?q ?w . }} } limit 10

Page 37: Publishing "5 star" data: the case for RDF