self-service linked government data
DESCRIPTION
A publishing pipeline for Linked Government DataTRANSCRIPT
Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Self-service Linked Government Data
Fadi Maali, Richard Cyganiak, Vassilios [email protected]
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge2
data.gov.uk
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge3
data.gov.uk
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge4
data.gov
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge5
data.gov
4997 datasets
2590 in CSV
272 in RDF
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge6
Why Linked Governemnt Data (LGD)?
Web accessible
Interlinkable
Decentralised publishing of data
Standardised
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge7
We need government data as Linked Data not just Raw Data
….aha, and of a good quality!
LGD
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge8
We want governments to provide Linked Data not just Raw Data… and of good quality
TIM
EM
ONE
Y SKIL
LS
LGD is Costly
http://code.google.com/p/google-refine/
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge9
DIY
Self-service Approach
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge10
Self-service Approach
DIYProvide tools, models and algorithms that enable the self-service approach (a publishing pipeline)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge11
Interactive approach
Graphical user interface
Reproducibility and traceability
Flexibility
Decentralisation
Results sharing
Publishing pipeline requirements
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge12
Interactive approach
Graphical user interface
Reproducibility and traceability
Flexibility
Decentralisation
Results sharing
Publishing pipeline requirements
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge13
Powerful data editing, transformation and enriching capabilities
Import capabilities e.g. JSON, Excel, CSV, TSV, XML, etc.
Persistent undo/redo history
Popular in open data community
Extensible and under active development
Free and open source
Google Refine
http://code.google.com/p/google-refine/
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge14
DIY Recipe (1000 feet view)
Publishers provide RDF representation of their catalogues
User shares the RDF data
Tool support to select datasets of interest and put them into RDF
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge15
DIY Recipe (100 feet view)
Publishers provide RDF representation of their catalogues
dcat
User shares the RDF data
Tool support to select datasets of interest and put them into RDF
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge16
Tool support to select datasets of interest and put them into RDF User shares the
RDF data
Publishers provide RDF representation of their catalogues
dcat
Google Refine
+ RDF export extension+ RDF reconciliation extension
DIY Recipe (100 feet view)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge17
User shares the RDF dataTool support to select datasets of interest and put them into RDF
Publishers provide RDF representation of their catalogues
dcat Google Refine
+ RDF export extension
+ RDF reconciliation extension
Share RDF data publicly (on CKAN.net) along with the sufficient provenance description
DIY Recipe (100 feet view)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge18
A Walk-through (1/5)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge19
A Walk-through (2/5)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge20
A Walk-through (3/5)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge21
A Walk-through (4/5)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge22
A Walk-through (5/5)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge23
Data on CKAN.net
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge24
:dataset
:csv-ds:export-process:json-history
dct:source:wasExportedBy
:usedData:operations
Data Provenance (simplified)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge25
An RDF vocabulary to describe government catalogues
Current status: First Public Working Draft by the W3C GLD Working Grouphttp://www.w3.org/TR/vocab-dcat/
Used on data.gov.uk (RDFa) and CKAN-based catalogues
“Enabling Interoperability of Government Data Catalogues.”EGOV 2010
DIY Recipe (10 feet view)
Dcat
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge26
RDF Mapping
DIY Recipe (10 feet view)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge27
RDF-centric mapping
Multiple tree structure
Expression language for custom expression
Vocabularies/ontologies support
More on RDF Mapping
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge28
Interlinking
RDF Reconcile Extension
Silk Server
SPARQL endpoint
Sindice se
arch A
PI
Crafted RDF
SPARQL
SPARQL endpoint with fulltext extension
Hybrid SPARQL
Silk LSL
Google Refine
DIY Recipe (10 feet view)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge29
More on Interlinking
Interlinking as a pre-RDF-creation step less unnecessary owl:sameAs
Focus on the interface
Semi-automatic process with good user support
“Re-using Cool URIs: Entity Reconciliation Against LOD Hubs.”LDOW 2011
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge30
Sharing
Captures the operations applied to the data
Represent them according to Open Provenance Model Vocabulary (OPMV)
Share the data and its provennce on CKAN.net
CKAN Extension fro Google Refinehttp://lab.linkeddata.deri.ie/2011/grefine-ckan/
DIY Recipe (10 feet view)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge31
Case study - Fingal Catalogue
Number of datasets: 74 (68 available in CSV and 56 in XML)
Top publishers: Fingal county Council (41), Central Statistics Office (17), Department of Education and Science (4)
Top domains: Demographics(18), Citizen Participation(18), Education(9)
http://data.fingal.ie
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge32
Case study - Fingal Catalogue
The catalogue was represented in Dcat
60 datasets were converted to RDF using the publishing pipeline (~300K triples)
Data Cube was used for statistical data
URIs were used consistently and shared among datasets the data was interlinked
Externally linked to DBpedia
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge33
Evaluating/Refining the crowd-sourcing aspects of the RDF creation process
RDF Modeling: Can we assist RDF modeling by examining the raw data?
Open Issues
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge34
Lessons Learned
Interactive approach
Focus on plumbing tools together but don’t enforce a rigid process
Make it easy to adopt best-practices and good recipes