open provenance model tutorial session 4: use cases from data.uk

Post on 16-Mar-2016

45 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Open Provenance Model Tutorial Session 4: Use cases from data.gov.uk. Outline. Background about data.gov.uk The use cases XML serialization Data transformation on the fly Complex and nested processes. data.gov.uk. Linking UK government data Aims: - PowerPoint PPT Presentation

TRANSCRIPT

Open Provenance Model Tutorial Session 4: Use cases from data.gov.uk

Outline

• Background about data.gov.uk• The use cases– XML serialization– Data transformation on the fly– Complex and nested processes

data.gov.uk

• Linking UK government data• Aims:– Provide a set of best practices for government

agencies– Provide the minimum set of tooling and

specification to facilitate the publication of data– Encourage “responsible” data publishing

XML -> RDF

XSLT Processor

XSLT ParameterBinding

XSLT Stylesheet

XSLT Template

input outputRDF File

Who, when, which version,

how

XSLT Processorinput output

RDF FileXSLT ParameterBinding

XSLT Stylesheet

XSLT Template

Downloaded from;Unzipped from, etc Made accessible

Who, when, which version,

how

On-the-fly Transformation

Data transformation

wrapper

http://mytransportatio.db/j10

Who, when, which

version, how

Complex Data Creation Pipeline

GATE Pipeline

GateXMLRegressionTransformation

GateXMLRdfaTransformation

RdfaRdfXmlTransformation

Courtesy of Paul Appleby from TSO (Data Enrichment Service)

Complex Data Creation Pipeline

GATE Pipeline

GateXMLRegressionTransformation

GateXMLRdfaTransformation

RdfaRdfXmlTransformation

Document Reset PR

ANNIE English Tokeniser

ANNIE English Splitter

ANNIE POS Tagger

Data.gov.uk Morphological Analyzer

Data.gov.uk Flexible Roof Gazetteer

Data.gov.uk Generic Gazeteer

GATE Noun Phrase Chunker

Data.gov.uk Generic Transducer

TSO CoreferenceCourtesy of Paul Appleby from TSO (Data Enrichment Service)

wasGeneratedBy wasGeneratedBy wasGeneratedBy

hasParentProcess iterationOfProcess

Level 1: Provenance of execution at higher level

Level 0: Provenance of execution at detailed level

Services used by executions

Artifacts

followed

wasDerivedFrom A data collection

wasTriggeredBy wasTriggeredByaccessedService

Non-digital Data Objects

• Organizations– Organizational structure changes over time– Origin organization, resulting Organization

• Boundary• Legislation

An organization ontology: http://www.epimorphics.com/public/vocabulary/org.html

The Challenges

• Data of different representations, of physical forms, of granularity

• Not tooling support• Provenance across different types of systems– Identification– Different terminologies

The Gaps

• A vocabulary being able to describe provenance of all types of data, from different systems

• A vocabulary still providing enough terms to describe provenance accurately

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)

top related