llinked open data training for eu institutions

103
DATA SUPPORT OPEN Linked Open Data Principles, Technologies and Examples PwC firms help organisations and individuals create the value they’re looking for. We’re a network of firms in 158 countries with close to 180,000 people who are committed to delivering quality in assurance, tax and advisory services. Tell us what matters to you and find out more by visiting us at www.pwc.com. PwC refers to the PwC network and/or one or more of its member firms, each of which is a separate legal entity. Please see www.pwc.com/structure for further details.

Upload: open-data-support

Post on 21-Jan-2017

1.657 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Llinked open data training for EU institutions

DATASUPPORT

OPENLinked Open DataPrinciples Technologies and Examples

PwC firms help organisations and individuals create the value theyrsquore looking for Wersquore a network of firms in 158 countries with close to 180000 people who are committed to

delivering quality in assurance tax and advisory services Tell us what matters to you and find out more by visiting us at wwwpwccom

PwC refers to the PwC network andor one or more of its member firms each of which is a separate legal entity Please see wwwpwccomstructure for further details

DATASUPPORTOPEN

Learning objectives

By the end of the course participants should have a clear understanding of

bull What linked open data is

bull What is the difference between linked and open data

bull How to publish linked data

bull The economic and social aspects of linked data

bull How linked data technologies can be applied to improve the

availability understandability and usability of EU data

Slide 2

DATASUPPORTOPEN

Content

This training consists of 3 modules

1 Introduction to linked data

2 Introduction to RDF amp SPARQL

3 Workshop on publishing open linked EU data

Slide 3

DATASUPPORTOPEN

Learning Module 1

Introduction to Linked Data

Slide 4

DATASUPPORTOPEN

Introduction to linked data

This module contains

bull An introduction to the linked data principles

bull The expected benefits of linked data

bull An introduction to linked data technologies

bull An outline of the 5-star scheme for publishing linked data

bull An overview of linked data initiatives in Europe

Slide 5

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

What is linked data

Evolution from a document-based Web to a Web of interlinked data

Slide 6

DATASUPPORTOPEN

The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo

bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL

bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines

bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer

2 types of machine-readable

data exist

bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa

bull data formats intended principally for computers eg RDF XML and JSON

Slide 7

See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml

httplinkeddatabookcomeditions10

DATASUPPORTOPEN

Defining linked dataProviding data as a service

ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo

EC ISA Case Study How Linked Data is transforming eGovernment

The four design principles of Linked Data (by Tim Berners Lee)

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover more things

Slide 8

See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q

httpwwww3orgDesignIssuesLinkedDatahtml

httpwwwyoutubecomwatchv=uju4wT9uBIA

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 2: Llinked open data training for EU institutions

DATASUPPORTOPEN

Learning objectives

By the end of the course participants should have a clear understanding of

bull What linked open data is

bull What is the difference between linked and open data

bull How to publish linked data

bull The economic and social aspects of linked data

bull How linked data technologies can be applied to improve the

availability understandability and usability of EU data

Slide 2

DATASUPPORTOPEN

Content

This training consists of 3 modules

1 Introduction to linked data

2 Introduction to RDF amp SPARQL

3 Workshop on publishing open linked EU data

Slide 3

DATASUPPORTOPEN

Learning Module 1

Introduction to Linked Data

Slide 4

DATASUPPORTOPEN

Introduction to linked data

This module contains

bull An introduction to the linked data principles

bull The expected benefits of linked data

bull An introduction to linked data technologies

bull An outline of the 5-star scheme for publishing linked data

bull An overview of linked data initiatives in Europe

Slide 5

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

What is linked data

Evolution from a document-based Web to a Web of interlinked data

Slide 6

DATASUPPORTOPEN

The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo

bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL

bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines

bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer

2 types of machine-readable

data exist

bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa

bull data formats intended principally for computers eg RDF XML and JSON

Slide 7

See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml

httplinkeddatabookcomeditions10

DATASUPPORTOPEN

Defining linked dataProviding data as a service

ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo

EC ISA Case Study How Linked Data is transforming eGovernment

The four design principles of Linked Data (by Tim Berners Lee)

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover more things

Slide 8

See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q

httpwwww3orgDesignIssuesLinkedDatahtml

httpwwwyoutubecomwatchv=uju4wT9uBIA

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 3: Llinked open data training for EU institutions

DATASUPPORTOPEN

Content

This training consists of 3 modules

1 Introduction to linked data

2 Introduction to RDF amp SPARQL

3 Workshop on publishing open linked EU data

Slide 3

DATASUPPORTOPEN

Learning Module 1

Introduction to Linked Data

Slide 4

DATASUPPORTOPEN

Introduction to linked data

This module contains

bull An introduction to the linked data principles

bull The expected benefits of linked data

bull An introduction to linked data technologies

bull An outline of the 5-star scheme for publishing linked data

bull An overview of linked data initiatives in Europe

Slide 5

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

What is linked data

Evolution from a document-based Web to a Web of interlinked data

Slide 6

DATASUPPORTOPEN

The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo

bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL

bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines

bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer

2 types of machine-readable

data exist

bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa

bull data formats intended principally for computers eg RDF XML and JSON

Slide 7

See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml

httplinkeddatabookcomeditions10

DATASUPPORTOPEN

Defining linked dataProviding data as a service

ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo

EC ISA Case Study How Linked Data is transforming eGovernment

The four design principles of Linked Data (by Tim Berners Lee)

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover more things

Slide 8

See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q

httpwwww3orgDesignIssuesLinkedDatahtml

httpwwwyoutubecomwatchv=uju4wT9uBIA

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 4: Llinked open data training for EU institutions

DATASUPPORTOPEN

Learning Module 1

Introduction to Linked Data

Slide 4

DATASUPPORTOPEN

Introduction to linked data

This module contains

bull An introduction to the linked data principles

bull The expected benefits of linked data

bull An introduction to linked data technologies

bull An outline of the 5-star scheme for publishing linked data

bull An overview of linked data initiatives in Europe

Slide 5

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

What is linked data

Evolution from a document-based Web to a Web of interlinked data

Slide 6

DATASUPPORTOPEN

The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo

bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL

bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines

bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer

2 types of machine-readable

data exist

bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa

bull data formats intended principally for computers eg RDF XML and JSON

Slide 7

See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml

httplinkeddatabookcomeditions10

DATASUPPORTOPEN

Defining linked dataProviding data as a service

ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo

EC ISA Case Study How Linked Data is transforming eGovernment

The four design principles of Linked Data (by Tim Berners Lee)

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover more things

Slide 8

See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q

httpwwww3orgDesignIssuesLinkedDatahtml

httpwwwyoutubecomwatchv=uju4wT9uBIA

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 5: Llinked open data training for EU institutions

DATASUPPORTOPEN

Introduction to linked data

This module contains

bull An introduction to the linked data principles

bull The expected benefits of linked data

bull An introduction to linked data technologies

bull An outline of the 5-star scheme for publishing linked data

bull An overview of linked data initiatives in Europe

Slide 5

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

What is linked data

Evolution from a document-based Web to a Web of interlinked data

Slide 6

DATASUPPORTOPEN

The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo

bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL

bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines

bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer

2 types of machine-readable

data exist

bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa

bull data formats intended principally for computers eg RDF XML and JSON

Slide 7

See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml

httplinkeddatabookcomeditions10

DATASUPPORTOPEN

Defining linked dataProviding data as a service

ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo

EC ISA Case Study How Linked Data is transforming eGovernment

The four design principles of Linked Data (by Tim Berners Lee)

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover more things

Slide 8

See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q

httpwwww3orgDesignIssuesLinkedDatahtml

httpwwwyoutubecomwatchv=uju4wT9uBIA

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 6: Llinked open data training for EU institutions

DATASUPPORTOPEN

What is linked data

Evolution from a document-based Web to a Web of interlinked data

Slide 6

DATASUPPORTOPEN

The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo

bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL

bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines

bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer

2 types of machine-readable

data exist

bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa

bull data formats intended principally for computers eg RDF XML and JSON

Slide 7

See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml

httplinkeddatabookcomeditions10

DATASUPPORTOPEN

Defining linked dataProviding data as a service

ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo

EC ISA Case Study How Linked Data is transforming eGovernment

The four design principles of Linked Data (by Tim Berners Lee)

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover more things

Slide 8

See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q

httpwwww3orgDesignIssuesLinkedDatahtml

httpwwwyoutubecomwatchv=uju4wT9uBIA

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 7: Llinked open data training for EU institutions

DATASUPPORTOPEN

The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo

bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL

bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines

bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer

2 types of machine-readable

data exist

bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa

bull data formats intended principally for computers eg RDF XML and JSON

Slide 7

See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml

httplinkeddatabookcomeditions10

DATASUPPORTOPEN

Defining linked dataProviding data as a service

ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo

EC ISA Case Study How Linked Data is transforming eGovernment

The four design principles of Linked Data (by Tim Berners Lee)

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover more things

Slide 8

See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q

httpwwww3orgDesignIssuesLinkedDatahtml

httpwwwyoutubecomwatchv=uju4wT9uBIA

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 8: Llinked open data training for EU institutions

DATASUPPORTOPEN

Defining linked dataProviding data as a service

ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo

EC ISA Case Study How Linked Data is transforming eGovernment

The four design principles of Linked Data (by Tim Berners Lee)

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover more things

Slide 8

See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q

httpwwww3orgDesignIssuesLinkedDatahtml

httpwwwyoutubecomwatchv=uju4wT9uBIA

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 9: Llinked open data training for EU institutions

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets

bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published

bull Ease of navigation makes browsing through complex data easier via URIs

bull Increase in data quality

The use of URIs leads to improved data management and quality

The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected

9

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 10: Llinked open data training for EU institutions

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Increase in data usability by providing data as a service

Resolvable URIs

Data is available in different formats not limited to RDF eg XML CSV text JSONhellip

bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either

Existing relationalspatial database systems by applying database-to-RDF conversions or

Existing XMLfile-based data

10

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 11: Llinked open data training for EU institutions

DATASUPPORTOPEN

The value proposition of linked (open) government data

bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)

bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange

bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector

11

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-

business-models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 12: Llinked open data training for EU institutions

DATASUPPORTOPEN

The four principles of linked data in practice

1 Use Uniform Resource Identifiers (URIs) as names for things

2 Use HTTP URIs so that people can look up those names

Eg for an organisation UNICEF in EuroVoc

- httpeurovoceuropaeu1022

Slide 12

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 13: Llinked open data training for EU institutions

DATASUPPORTOPEN

The four principles in practice

3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)

4 Include links to other URIs so that peoplemachines can discover more things

Slide 13

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 14: Llinked open data training for EU institutions

DATASUPPORTOPEN

Linked data vs open data

Open data

Data can be published and bepublicly available under an openlicence without linking to otherdata sources

Linked data

Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence

Slide 14

ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg

See alsoCobden et al A research agenda for Linked Closed Data

httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 15: Llinked open data training for EU institutions

DATASUPPORTOPEN

Linked data foundations

URIs for naming things RDF for describing data and SPARQL for querying linked data

Slide 15

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 16: Llinked open data training for EU institutions

DATASUPPORTOPEN

Uniform Resource Identifier (URI)

ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo

ndash ISArsquos 10 Rules for Persistent URIs

A country eg Belgium

- httppublicationseuropaeuresourceauthoritycountryBEL

An organisation eg the Publications Office

- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL

A dataset eg Countries Named Authority List

- httppublicationseuropaeuresourceauthoritycountry

Slide 16

BE

See alsohttpwwwslidesharenetOpenDataSupportdesign

-and-manage-persitent-uris

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 17: Llinked open data training for EU institutions

DATASUPPORTOPEN

RDF amp SPARQL

The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web

Slide 17

RDF breaks every piece of information down in triples

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

SPARQL is a standardised language for querying RDF data

httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR

httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium

Subject Predicate Object

See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 18: Llinked open data training for EU institutions

DATASUPPORTOPEN

How to publish linked data

Paving the way towards 5-star linked data

Slide 18

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 19: Llinked open data training for EU institutions

DATASUPPORTOPEN

5 star-schema of Linked Open Data

Make your stuff available on the Web (whatever format) under an open license

Make it available as structured data (eg Excel instead of image scan of a table)

Use non-proprietary formats (eg CSV instead of Excel)

Use URIs to denote things so that people can point at your stuff

Link your data to other data to provide context

Slide 19

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 20: Llinked open data training for EU institutions

DATASUPPORTOPEN

Make your stuff available on the Web under an open licence

Slide 20

Trends risks and

vulnerabilities in

securities markets

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 21: Llinked open data training for EU institutions

DATASUPPORTOPEN

Make it available as structured data

Slide 21

Waterbase - Emissions to water

CountryCode

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 22: Llinked open data training for EU institutions

DATASUPPORTOPEN

Use non-proprietary formats

bull Proprietary Excel Word PDF

bull Non-proprietary XML CSV RDF JSON ODF

DG Enlargement - Regional programmes

Slide 22

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 23: Llinked open data training for EU institutions

DATASUPPORTOPEN

Use URIs to denote things

Slide 23

See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 24: Llinked open data training for EU institutions

DATASUPPORTOPEN

Link your data to other data to provide context

Slide 24

Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 25: Llinked open data training for EU institutions

DATASUPPORTOPEN

LOGD roadblocks

bull Necessary investments

bull Lack of necessary competencies

bull Perceived lack of tools

bull Lack of service level guarantees

bull Missing restrictive or incompatible licences

bull Surfeit of standard vocabularies

bull The inertia of the status quo ndash change is accomplished slowly

25

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 26: Llinked open data training for EU institutions

DATASUPPORTOPEN

Linked data initiatives in Europe

Examples on supra-national national regional and private initiatives in the area of linked data

Slide 26

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 27: Llinked open data training for EU institutions

DATASUPPORTOPEN

EU institutions initiatives ndash some examples

bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql

bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data

bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data

bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate

bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint

Slide 27

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 28: Llinked open data training for EU institutions

DATASUPPORTOPEN

Initiatives funded by the European Commission

Slide 28

ADMS

SWCORE

VOCABULARY

PUBLICSERVICE

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 29: Llinked open data training for EU institutions

DATASUPPORTOPEN

Member State initiatives ndash some examples

DE ndash Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg

IT ndash Agenzia per lrsquoItalia digitiale

Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration

NL ndash Building and address register

The Dutch Address and Buildings base register published as linked data

UK ndash Ordnance Survey

Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line

UK ndash Companies House

Publishing basic company details as linked data using a simple URI for each company in their database

Slide 29

See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-

models-linked-open-government-data-bm4logd

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 30: Llinked open data training for EU institutions

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 30

Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics

Semantic representation using RDF and Linked Databull URIs for things amp RDF data model

Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]

Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt

See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 31: Llinked open data training for EU institutions

DATASUPPORTOPEN

Member States Initiatives ndash UK National Archives

Slide 31

Versioning of legislation in RDF

httpwwwlegislationgovukidukpga201032section124datardf

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 32: Llinked open data training for EU institutions

DATASUPPORTOPEN

Open amp linked data at BBC

bull BBC Things the open data website of BBC allows anyone to access the data

that BBC stores about data on the places people and organisations that appear

in BBC programmes and online content

bull This data already powers large parts of the BBC website including BBC News and

Sport

bull BBC Things is part of the BBC Linked Data Platform which provides public

access to data stored in the BBC platform and provides a public reference for all of

the things that the BBC creates content about

Slide 32

Further reading

httpwwwbbccoukthingssearchq=juncker

httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 33: Llinked open data training for EU institutions

DATASUPPORTOPEN Slide 33

Open amp linked data at BBC

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 34: Llinked open data training for EU institutions

DATASUPPORTOPEN

Data Value Chains using Linked Data at Volkswagen

Slide 34

Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 35: Llinked open data training for EU institutions

DATASUPPORTOPEN

1 Link databases

ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo

bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data

bull Need to uniquely identify resources

2 Add meaning

ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo

bull Need for adding semantics in order to allow machine reasoning

For example

bull Kristin is a field

bull Aringsgard is an oil platform

bull Statoil Petroleum AS is a company

Linked Data in the oil and gas industry

Slide 35

Further reading httpwwwtopquadrantcom

resourcessolutionsdocsSe

mantic-data-oil-and-gaspdf

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 36: Llinked open data training for EU institutions

DATASUPPORTOPEN

Conclusions

bull Linked data is a set of design principles for sharing machine-readable data on the Web

bull URIs RDF and SPARQL form the foundational layer for Linked data

bull Linked data offers a number of advantages such as

o Data integration with small impact on legacy systems

o Enables for semantic interoperability

o Easier browsing through complex data

o Increased data quality

Slide 36

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 37: Llinked open data training for EU institutions

DATASUPPORTOPEN

Conclusions contrsquod

bull Linked data offers a number of advantages such as

o Enables easy updates adaptations and extensions of data models

o Cost reduction from the reuse of LOGD in e-Government applications

o Enables creativity and innovation through context and knowledge-

creation

Slide 37

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 38: Llinked open data training for EU institutions

DATASUPPORTOPEN

Learning Module 2

Introduction to RDF amp SPARQL

Slide 38

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 39: Llinked open data training for EU institutions

DATASUPPORTOPEN

Introduction to RDF and SPARQL

This module contains

bull An introduction to the Resource Description Framework (RDF) for describing your data

bull An introduction to SPARQL on how you can query and manipulate data in RDF

Slide 39

Find more on trainingopendatasupporteu

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 40: Llinked open data training for EU institutions

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have a clear understanding of

bull The Resource Description Framework (RDF)

bull How to writeread RDF

bull How you can describe your data with RDF

bull What SPARQL is

bull How to understand and write a SPARQL SELECT query

Slide 40

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 41: Llinked open data training for EU institutions

DATASUPPORTOPEN

Resource Description Framework

An introduction to RDF

Slide 41

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 42: Llinked open data training for EU institutions

DATASUPPORTOPEN

RDF in the stack of Semantic Web technologies

Resource Everything that can have a unique identifier (URI) eg pages places people organisations products

Description attributes features and relations of the resources

Framework model languages and syntaxes for these descriptions

bull Published as a W3C recommendation in 1999

bull RDF was originally introduced as a data model for metadata

bull RDF was generalised to cover knowledge of all kinds

Slide 42

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 43: Llinked open data training for EU institutions

DATASUPPORTOPEN

Example RDF description of an organisation

Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG

Slide 43

ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt

ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt

ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt

ltorgOrganizationgt

ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt

ltlocnAddressgt

ltrdfRDFgt

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 44: Llinked open data training for EU institutions

DATASUPPORTOPEN

RDF structure

Triples graphs and syntaxes

Slide 44

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 45: Llinked open data training for EU institutions

DATASUPPORTOPEN

What is a triple

Slide 45

Every piece of information expressed in RDF is represented as a triple

bull Subject ndash a resource which is identified with a URI

bull Predicate ndash a URI-identified reused specification of the relationship

bull Object ndash a resource or literal to which the subject is related

httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo

Subject Predicate Object

Example name of a dataset

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 46: Llinked open data training for EU institutions

DATASUPPORTOPEN

RDF SyntaxRDFXML

Slide 46

ltrdfRDF

xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo

ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt

ltdcatDatasetgt

ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt

ltdctPublishergt

ltrdfRDFgt

Subject

Predicate

Object

Gra

ph

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 47: Llinked open data training for EU institutions

DATASUPPORTOPEN

Visual representation (RDF graph) of the triples from the RDFXML syntax example

Slide 47

Subject

Predicate

Object

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 48: Llinked open data training for EU institutions

DATASUPPORTOPEN

RDF SyntaxTurtle

Subject

Predicate

Object

Slide 48

prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms

lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt

lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo

Gra

ph

See alsohttpwwww3org200912rdf-wspapersws11

Definition of prefixes

Description of data ndash triples

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 49: Llinked open data training for EU institutions

DATASUPPORTOPEN

RDF SyntaxRDFa

Subject

Predicate

Object

Slide 49

lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt

See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607

embedding RDF data in HTML

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 50: Llinked open data training for EU institutions

DATASUPPORTOPEN

How to represent data in RDF

Classes properties and vocabularies

Slide 50

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 51: Llinked open data training for EU institutions

DATASUPPORTOPEN

RDF Vocabulary

ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo

bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo

bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties

bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties

Slide 51

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 52: Llinked open data training for EU institutions

DATASUPPORTOPEN

Examples of classes relationships and properties The Core Person Vocabulary in UML

52

class Healthcare Domain

Core VocabulariesIdentifier

dateOfIssue dateTime [01]

identifier string [11]

identifierType string [01]

issuingAuthority string [01]

issuingAuthorityUri URI [01]

Core VocabulariesPerson

alternativeName string

birthName string

dateOfBirth dateTime

dateOfDeath dateTime

familyName string

fullName string

gender code

givenName string

patronymicName string

Core VocabulariesLocation

geographicIdentifier URI

geographicName string

Core VocabulariesAddress

addressArea string

addressID string

adminUnitL1 string

adminUnitL2 string

fullAddress string

locatorDesignator string

locatorName string

poBox string

postCode string

postName string

thoroughfare string

Core VocabulariesGeometry

lat string

long string

wkt string

xmlGeometry XML

address

identifies

geometry

placeOfDeath

countryOfDeath

placeOfBirth

countryOfBirth

identifier

UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model

Relationships ClassProperties

Class

Class

Class

Class

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 53: Llinked open data training for EU institutions

DATASUPPORTOPEN

Introduction to SPARQL

The RDF Query Language

Slide 53

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 54: Llinked open data training for EU institutions

DATASUPPORTOPEN

About SPARQL

SPARQL is the standard language to query graph data represented as RDF triples

bull SPARQL Protocol and RDF Query Language

bull One of the three core standards of the Semantic Web along with RDF and OWL

bull Became a W3C standard January 2008

bull SPARQL 11 is a W3C Recommendation since March 2013

Slide 54

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 55: Llinked open data training for EU institutions

DATASUPPORTOPEN

Types of SPARQL queries

bull SELECT Return a table of all X Y etc satisfying the following conditions

bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph

bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)

bull INSERT Add triples to the RDF graph

bull DELETE Delete triples from the RDF graph

bull ASK Are there any X Y etc satisfying the following conditions

Slide 55

See alsohttpwwweuclid-projecteumoduleschapter2

httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 56: Llinked open data training for EU institutions

DATASUPPORTOPEN

PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt

SELECT titleWHERE

dataset rdftype dcatDataset dataset rdftitle title

Structure of a SPARQL Query

Slide 56

Type of

query Variables ie what to search for

RDF triple patterns ie

the conditions that

have to be met

Definition of

prefixes

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 57: Llinked open data training for EU institutions

DATASUPPORTOPEN

SELECT ndash return the name of a dataset with particular URI

Slide 57

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset

WHERE

lthttpauthorityfile-typegt dcttitle dataset

dataset

ldquoFile types Name Authority Listrdquo

Sample data

Query

Result

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 58: Llinked open data training for EU institutions

DATASUPPORTOPEN

SELECT - return the name and publisher of a dataset

Slide 58

PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt

SELECT dataset publisher

WHEREhttpauthorityfile-type dctpublisher publisherURI

httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher

dataset publisher

ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo

lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt

lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo

Sample data

Query

Result

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 59: Llinked open data training for EU institutions

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (1)

Slide 59

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 60: Llinked open data training for EU institutions

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 60

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 61: Llinked open data training for EU institutions

DATASUPPORTOPEN

SPARQL Example ndash EU ODP (2)

Slide 61

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 62: Llinked open data training for EU institutions

DATASUPPORTOPEN

Summary

bull RDF is a general way to express data intended for publishing on the Web

bull RDF data is expressed in triples subject predicate object

bull Different syntaxes exist for expressing data in RDF

bull SPARQL is a standardised language to query graph data expressed as RDF

bull SPARQL can be used to query and update RDF data

Slide 62

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 63: Llinked open data training for EU institutions

DATASUPPORTOPEN Slide 63

Learning Module 3

Workshop for Publishing Open

Linked EU Data

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 64: Llinked open data training for EU institutions

DATASUPPORTOPEN

Workshop for publishing open linked EU data

This module is about

bull Creating an RDF vocabulary for modelling your data

How to reuse existing vocabularies to model your data

How to create new classes and properties in RDF

How and where to publish your RDF vocabulary so that it can be reused by others

bull An example of how tabular data can be published as Linked Open Data using Open Refine

Slide 64

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 65: Llinked open data training for EU institutions

DATASUPPORTOPEN

Learning objectives

By the end of this training module you should have an understanding of

bull What the best practices are for creating an RDF vocabulary for modelling your data

bull Where to find RDF vocabularies for reuse

bull How you can create your own RDF vocabulary

bull How to publish your RDF vocabulary

bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission

Slide 65

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 66: Llinked open data training for EU institutions

DATASUPPORTOPEN

Creating an RDF vocabulary

How to reuse other vocabularies define your own terms publish and promote your vocabulary

Slide 66

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 67: Llinked open data training for EU institutions

DATASUPPORTOPEN

6 steps for creating an RDF vocabulary

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties

Where new terms are required create them following commonly agreed best practice

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Slide 67

1

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

2

3

4

5

6

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 68: Llinked open data training for EU institutions

DATASUPPORTOPEN

Start with a robust Domain Model

Slide 68

1

hasCeiling

hasPoliticalcategory

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeading

EU Programme

CodeType

Political category

CodeDescription

Corporate body

CodeTypeLocation

IntroductionRemarkConditions

AcronymLegal base periodLegal base typeLegal base status

hasCorporate body

has

Nomenclature

has

EU Programme

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 69: Llinked open data training for EU institutions

DATASUPPORTOPEN

General purpose vocabularies DCMI RDFS

To name things rdfslabel foafname skosprefLabel

To describe people FOAF vCard Core Person Vocabulary

To describe projects DOAP ADMSSW

To describe interoperability assets ADMS

To describe registered organisations Registered Organisation Vocabulary

To describe addresses vCard Core Location Vocabulary

To describe public services Core Public Service Vocabulary

To describe datasets DCAT DCAT Application Profile VoID

Reuse existing terms and vocabularies

Slide 69

2

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 70: Llinked open data training for EU institutions

DATASUPPORTOPEN

Well-known vocabularies

Slide 70

DCAT-AP Vocabulary for describing datasets in Europe

Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth

DOAP Vocabulary for describing projects

ADMS Vocabulary for describing interoperability assets

Dublin Core Defines general metadata attributes

Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register

Organization Ontology for describing the structure of organizations

Core Location VocabularyVocabulary capturing the fundamental characteristics of a location

Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration

schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft

See alsohttpwwww3orgwikiTaskForcesCommunityProj

ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies

2

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 71: Llinked open data training for EU institutions

DATASUPPORTOPEN

bull Reuse greatly aids interoperability of your data

Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses

bull Reuse adds credibility to your schema

It shows it has been published with care and professionalism again this promotes its reuse

bull Reuse is easier and cheaper

Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort

Slide 71

Advantages of reuse

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 72: Llinked open data training for EU institutions

DATASUPPORTOPEN

You can find reusable RDF vocabularies on

Slide 72

httpjoinupeceuropaeu httplovokfnorg

Reuse existing terms and vocabularies2

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 73: Llinked open data training for EU institutions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

bull RDF schemas and vocabularies often include terms that are very generic

bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown

bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists

Slide 73

3

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 74: Llinked open data training for EU institutions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 74

3

The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription

Nomenclature

TypeHeadingIntroductionRemarkConditions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 75: Llinked open data training for EU institutions

DATASUPPORTOPEN

Creation of sub-classes and sub-properties

Slide 75

3

The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject

Amount

CurrencyFigureTypeYear

Nomenclature

TypeHeadingIntroductionRemarkConditions

has

Nomenclature

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 76: Llinked open data training for EU institutions

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

Classes begin with a capital letter and are always singular eg skosConcept

Properties begin with a lower case letter eg rdfslabel

Object properties should be verbs eg orghasSite

Data type properties should be nouns eg dctermsdescription

Use camel case if a term has more than one word eg foafisPrimaryTopicOf

Slide 76

4

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 77: Llinked open data training for EU institutions

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoAmountrdquo class

Slide 77

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 78: Llinked open data training for EU institutions

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary

- RDF Schema (RDFS)

- Web Ontology Language (OWL)

Example defining the ldquoamount typerdquo property

Slide 78

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

Amount

CurrencyFigureTypeYear

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 79: Llinked open data training for EU institutions

DATASUPPORTOPEN

Where new terms are required create them following commonly agreed best practices

When defining new properties consider to define their domain and range

A range states that the values of a property are instances of one or more classes

A domain states on which classes a given property can be used

Slide 79

4

See alsohttpwwwslidesharenetOpenDataSupportmodel-your-

data-metadata

hasCeiling

Amount

CurrencyFigureTypeYear

Political category

CodeDescription

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 80: Llinked open data training for EU institutions

DATASUPPORTOPEN

Publish within a highly stable environment designed to be persistent

bull Choose a stable namespace for your RDF vocabulary

Example httpdataeuropaeubud

bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management

Examples

o httpwwww3orgnsadms

o httppurlorgdcelements11

Slide 80

5

See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 81: Llinked open data training for EU institutions

DATASUPPORTOPEN

Publicise the RDF vocabulary by registering it with relevant services

Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies

Slide 81

6

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 82: Llinked open data training for EU institutions

DATASUPPORTOPEN

Conclusions

Slide 82

Start with a robust Domain Model developed following a structured process and methodology

Research existing terms and their usage and maximise reuse of those terms

Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate

Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc

Publish within a highly stable environment designed to be persistent

Publicise the RDF vocabulary by registering it with relevant services

Analyse

Model

Publish

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 83: Llinked open data training for EU institutions

DATASUPPORTOPEN

Example

Using Open Refine for RDF to publish tabular data as Linked Data

Slide 83

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 84: Llinked open data training for EU institutions

DATASUPPORTOPEN

What is Open Refine

Slide 84

ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg

See alsoOpen Refine website

httpopenrefineorg

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 85: Llinked open data training for EU institutions

DATASUPPORTOPEN

What is Open Refine RDF extension

Open Refine RDF extension allows you to easily import data in different formats such as

CSV

Excel(xls and xlsx)

JSON

XML and

RDFXML

And then determine the intended structure of an RDF dataset by drawing a template graph

Slide 85

See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 86: Llinked open data training for EU institutions

DATASUPPORTOPEN

Using Open Refine to model and publish open data Getting started

1 Install Open Refine from httpsgithubcomOpenRefine

2 Install the RDF extension httprefinederiie

And then

Describe your data in a spreadsheet

Create a project and upload it in Open Refine

Clean up the data

Map your data to appropriate RDF classes amp properties

Export the data in RDF

Slide 86

1

2

3

4

5

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 87: Llinked open data training for EU institutions

DATASUPPORTOPEN

Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary

Digital Agenda Scoreboard

Slide 87

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 88: Llinked open data training for EU institutions

DATASUPPORTOPEN

Describe your data in a spreadsheet

Download the tabular data

Slide 88

1

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 89: Llinked open data training for EU institutions

DATASUPPORTOPEN

Create a project and upload it in Open Refine

Slide 89

2

Upload the spreadsheet

Select relevant tabs

Create the project

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 90: Llinked open data training for EU institutions

DATASUPPORTOPEN

Clean up the data ndash table harmonisation

Slide 90

3

bull Star amp remove unnessary rows

bull Rename columns

bull Use facets to select the data to be published

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 91: Llinked open data training for EU institutions

DATASUPPORTOPEN

Clean up the data ndash prepare RDF

Slide 91

3

bull Create URI representation for the involved object values

bull via formula

bull via reconsiliation

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 92: Llinked open data training for EU institutions

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 92

4

Understand the target vocabulary eg W3C RDF Data Cube Vocabulary

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 93: Llinked open data training for EU institutions

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

Slide 93

4

Define a skeleton to transform your spreadsheet data to RDF

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 94: Llinked open data training for EU institutions

DATASUPPORTOPEN

Map your data to appropriate RDF classes amp properties (model your data)

You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton

You can set the base URI for the data

Slide 94

Graphical interface to copypaste an existing RDF skeleton

Graphical interface to edit an RDF skeleton

4

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 95: Llinked open data training for EU institutions

DATASUPPORTOPEN

Export your data to RDFXML or Turtle

Slide 95

5

Export of the data in Turtle

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 96: Llinked open data training for EU institutions

DATASUPPORTOPEN

Production pipelines

From desk to automated pipeline

Slide 96

flexibility

volume

OpenRefine

UnifiedViews

Cellar

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 97: Llinked open data training for EU institutions

DATASUPPORTOPEN

Thank you for your attention

and now YOUR questions

Slide 97

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 98: Llinked open data training for EU institutions

DATASUPPORTOPEN

References

bull 5 Open Data http5stardatainfo

bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure

bull An organization ontology W3C httpwwww3orgTRvocab-org W3C

bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment

bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies

bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf

bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1

bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml

Slide 98

bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook

bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet

bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2

bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata

bull Open Refine httpsgithubcomOpenRefine

bull RDF Extension httprefinederiie

bull Resource Description Framework W3C httpwwww3orgRDF

bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng

bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 99: Llinked open data training for EU institutions

DATASUPPORTOPEN

Further reading

Slide 99

EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements

EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 100: Llinked open data training for EU institutions

DATASUPPORTOPEN

Further reading

EUCLID - Course 1 Introduction and Application Scenarios

httpwwweuclid-projecteumodulescourse1

EUCLID - Course 2 Querying Linked Data

httpwwweuclid-projecteumodulescourse2

Learning SPARQL Bob DuCharme

httpwwwlearningsparqlcom

Linked Data Cookbook W3C Government Linked Data Working Group

httpwwww3org2011gldwikiLinked_Data_Cookbook

Slide 100

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 101: Llinked open data training for EU institutions

DATASUPPORTOPEN

Further reading

Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer

httplinkeddatabookcomeditions10

Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck

httpwwwsemantic-webatLOD-TheEssentialspdf

Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas

httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454

Semantic Web for the working ontologist Dean Allemang Jim Hendler

httpworkingontologistorg

Slide 101

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 102: Llinked open data training for EU institutions

DATASUPPORTOPEN

Be part of our team

Slide 102

Find us on

Contact us

Join us on

Follow us

Open Data SupporthttpwwwslidesharenetOpenDataSupport

httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI

OpenDataSupport contactopendatasupporteu

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice

Page 103: Llinked open data training for EU institutions

DATASUPPORTOPEN

This presentation has been created by PwC

Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns

Presentation metadata

Slide 103

Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)

copy 2015 European Commission

Disclaimers

1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative

2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice