publishing "5 star" data: the case for rdf
DESCRIPTION
In the Open Data world we are encouraged to try to publish our data as “5-star” Linked Data because of the semantic richness and ease of integration that the RDF model offers. For many people and organisations this is a new world and some learning and experimenting is required in order to gain the necessary skills and experience to fully exploit this way of working with data. This workshop will re-assert the case for RDF and provide a guided tour of some examples of RDF publication that can act as a guide to those making a first venture into the field.TRANSCRIPT
Peter Winstanley: Holyrood Magazine Open Data Scotland: 10 December 2013
Semantic Issues
Application Integration
Total Effort
http://www.opengroup.org/subjectareas/si
Resource Description FrameworkRDF.
• Initially a way of adding metadata to XML• Subject-Predicate-Object or • Subject-Predicate-Literal triples
Scotland has an Authority that is Aberdeen City
Scotland Aberdeen City
“218,220”
PopulationAuthority
Aberdeen City has a Population with value “218,220”
E T L
Extraction, Transformation and Loading
“One often overlooked advantage that RDF offers is its deceptively simple data model. This data model trivializes merging of data from multiple sources and does it in such a way that data about the same things gets collated and deduplicated. In my opinion this is the most important benefit of using RDF over other open data formats.” (Ian Davis, 2011)http://blog.iandavis.com/2011/08/18/the-real-challenge-for-rdf-is-yet-to-come/
“Bonnet”
http://place.org/Paris
http://ex.org/lives
http:
//ex
.org
/nam
e
http://ex.org/owns
http://ex.org/pet/2“Sasha”
http://ex.org/name
A resource … with the name “Bonnet” …. living in Paris owns …Pet 2 … that is called Sasha
http://ex.org/pet/2
“ferret”
http://ex.org/sp
ecies
“chicken”htt
p://
ex.o
rg/fa
vour
iteFo
od
Pet 2 … is a ferret… and has chicken as the favourite food
“Bonnet”
http://place.org/Paris
http://ex.org/lives
http:
//ex
.org
/nam
e
http://ex.org/owns
http://ex.org/pet/2“Sasha”
http://ex.org/name
http://ex.org/pet/2
“ferret”
http://ex.org/sp
ecies
“chicken”
http:
//ex
.org
/favo
urite
Food
The two references tohttp://ex.org/pet/2 point tothe same resource so thegraphs can merge.
Fact Table Data Cube
EDW
"Schema up front" design
Change is costly
enterprise data warehouse
RDF triplestores:
• promiscuous
• schema-independent
In contrast...
What dothe joins mean?
http://www.torkiljohnsen.com/wp-content/uploads/2010/07/joomla_1.6_database_schema.png
RDF has explicit semantics.
In contrast...
http://ilrt.org/discovery/2001/01/rdf-thes/
RDF - Gross Morphology of Network
So RDF data is “5 star” because
No need for prior design discussion with data suppliers about data specification.
No need to design container before accepting data.
Datasets are self-describing. Explicit semantics.
Data deduplicates and collates automatically
Merged datasets are collated and de-duplicated automatically.
The Quick Tour
1. Ed/training2. Creation3. Storage4. Publishing5. Use
Education/training at scale
• The Euclid Project - http://www.euclid-project.eu/
• Module 1: Introduction and Application Scenarios• Module 2: Querying Linked Data• Module 3: Providing Linked Data• Module 4: Interaction with Linked Data• Module 5: Creating Linked Data Applications• Module 6: Scaling up
Creating RDF.
Hand written, or scripted• http://aksw.org/Projects/Xturtle.html • http://jena.apache.org/ • http://www.openrdf.org/ • http://www.rdflib.net/ • http://rdf.rubyforge.org/ • http://librdf.org/
Creating RDF..
GUI
http://openrefine.org/
Plugins to output RDF
'Reconciliation' services available
Creating RDF...
Wikis
• Use Wikipedia and let DBPedia work for you
• Semantic Mediawiki - outputs RDF and can be linked to triplestore directly
• Drupal and DBPedia - creates RDFa which can be scraped, - not very widely used.
Creating RDF....
Relational to RDF mapping
• D2R Server: Accessing databases with SPARQL and as Linked Data– http://opendata.tellmescotland.gov.uk
• Virtuoso RDF Views– http://location.testproject.eu/BEL/
Large In-Memory Triplestores?
Native RDF Triplestores.• Apache Jena "TDB"• Used in ....
Native RDF Triplestores..
• 4Store• Sesame• Mulgara• Bigdata
All provide SPARQL over HTTP, and native APIs
Geospatial Triplestores
• Virtuoso Universal Server (7.0, ColumnStore edition)
• Parliament (2.7.4 quickstart)
• uSeekM (1.2.0-a5, on top of PostgreSQL 8.4 and PostGIS 1.5)
• OWLIM-SE (Trial version 5.3.5849)
• Strabon (3.2.3, on top of PostgreSQL 8.4 and PostGIS 1.5)
Xen VMs for each available in Debian 6http://blog.geoknow.eu/virtual-machines-of-geospatial-rdf-stores/ Dr. Jens Lehmann. Uni Leipzig
Linked Data API.
PublishMyData Linked Data API
Linked Data API..
ELDA Linked Data API
Linked Data API...
Entity Resolution:
Victoria Quay is http://cofog01.data.scotland.gov.uk/id/facility/AB0103
...resolves http://cofog01.data.scotland.gov.uk/doc/facility/AB0103
Linked Data API....
Different serialisations [JSON, NT, RDF/XML etc]
HTTP "Accept" headers - e.g. "application/json"
303 re-directshttp://cofog01.data.scotland.gov.uk/id/facility/AB0103.nt http://cofog01.data.scotland.gov.uk/doc/facility/AB0103.rdf
Linked Data API.....• SPARQL is for experts
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX sepaw: <http://data.sepa.org.uk/def/water/>PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX sepaloc: <http://data.sepa.org.uk/def/location/>PREFIX sepaw: <http://data.sepa.org.uk/def/water/>CONSTRUCT {?item sepaw:waterBodyId ?___0 .?item sepaw:wiseCode ?___1 .?item sepaw:inRiverBasinDistrict ?___2 .?___2 rdfs:label ?___3 .?item geo:lat ?___4 .?item sepaw:category ?___5 .?item sepaw:inSubBasinDistrict ?___6 .?___6 rdfs:label ?___7 .?item rdfs:label ?___8 .?item sepaw:lengthKm ?___9 .?item sepaw:currentOverallClassification ?___10 .?item sepaloc:unitaryAuthority ?___11 .?item geo:long ?___12 .?item sepaw:inCatchment ?___13 .?___13 rdfs:label ?___14 .?item sepaw:currentClassificationYear ?___15 .?item sepaloc:postcodeDistrict ?___16 .?item sepaw:areaSqKm ?___17 .
}
WHERE { {SELECT ?item WHERE { ?item rdf:type sepaw:SurfaceWaterBody . } OFFSET 0 LIMIT 10}{ ?item sepaw:waterBodyId ?___0 . } UNION { ?item sepaw:wiseCode ?___1 . } UNION {{ ?item sepaw:inRiverBasinDistrict ?___2 . } OPTIONAL { { ?___2 rdfs:label ?___3 . } }}UNION { ?item geo:lat ?___4 . } UNION { ?item sepaw:category ?___5 . } UNION {{ ?item sepaw:inSubBasinDistrict ?___6 . } OPTIONAL { { ?___6 rdfs:label ?___7 . } }}UNION { ?item rdfs:label ?___8 . } UNION { ?item sepaw:lengthKm ?___9 . } UNION { ?item sepaw:currentOverallClassification ?___10 . } UNION { ?item sepaloc:unitaryAuthority ?___11 . } UNION { ?item geo:long ?___12 . } UNION {{ ?item sepaw:inCatchment ?___13 . } OPTIONAL { { ?___13 rdfs:label ?___14 . } }}UNION { ?item sepaw:currentClassificationYear ?___15 . } UNION { ?item sepaloc:postcodeDistrict ?___16 . } UNION { ?item sepaw:areaSqKm ?___17 . }
}
Linked Data API....
• Linked Data API makes it easy
http://data.sepa.org.uk/doc/water/surfacewaters
http://data.sepa.org.uk/doc/water/surfacewaters.xml
FluidOps Workbench & FedX
• Built on top of Sesame RDF store• Wiki-like structure for interaction• Data pipelined in from external SPARQL and
other sources• Includes widgets, graph views, facet views etc
for interacting with the aggregated data
What RDF data is "out there" already?
September 2013http://en.wikipedia.org/wiki/DBpedia
DBPedia - at the heart of Open Data
45 million interlinks withFreebaseOpenCycUMBELGeoNames,Musicbrainz,CIA World Fact BookDBLPProject GutenbergDBtune JamendoEurostatUniprotBio2RDFUS Census data
Also used inThomson Reuters OpenCalaisNew York Times Linked Open DataZemanta API DBpedia Spotlight BBC datasets
Quick Test• http://localhost:3030/sparql-editor.tpl
SELECT ?country ?country_name ?capital ?pop ?p ?x ?q ?w WHERE { SERVICE <http://dbpedia.org/sparql/sparql> { ?country a type:LandlockedCountries ; rdfs:label ?country_name ; prop:populationEstimate ?pop; prop:capital ?capital . FILTER ( lang(?country_name) = 'en' ) } SERVICE <http://worldbank.270a.info/sparql> {optional { ?p ?x ?country. ?p ?q ?w . }} } limit 10